Difference between revisions of "Specifications for source files"
From WormBaseWiki
Jump to navigationJump to search(14 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
*The format will continue to be a tab-delimited file containing, in order: | *The format will continue to be a tab-delimited file containing, in order: | ||
− | + | #SSC (stands for Textpresso Sentence SCore):Numerical Textpresso sentence score value | |
− | #SSC (stands for Textpresso Sentence SCore) | + | #Paper database code:numerical identifier:Paper section:Sentence ID in Textpresso document:corpus_date_of_search |
− | + | ##For WormBase and dictyBase, who are sending annotations to the Protein2GO tool via web services, this identifier needs to be either a PubMed ID or a doi. | |
− | # | ||
− | |||
− | ##For WormBase and dictyBase, who are sending annotations to the Protein2GO tool via web services, this identifier needs to be either a PubMed ID or a doi | ||
##If neither a PubMed ID nor a doi exists, then the annotation cannot be sent to Protein2GO | ##If neither a PubMed ID nor a doi exists, then the annotation cannot be sent to Protein2GO | ||
##For TAIR, who is not yet using Protein2GO, it can still be the TAIR document ID | ##For TAIR, who is not yet using Protein2GO, it can still be the TAIR document ID | ||
− | + | #Gene product name or synonym as identified by Textpresso search (names below are as seen on Textpresso web sites) | |
− | + | ##WormBase: protein (C. elegans) | |
− | #Gene product name or synonym as identified by Textpresso search | + | ##dictyBase: dicty gene |
− | ##WormBase: C. elegans ( | + | ##TAIR: gene (arabidopsis) |
+ | #Textpresso component category match (names below are as seen on Textpresso web sites) | ||
+ | ##WormBase: CCC cellular component 2011-02-11 | ||
+ | ##'''dictyBase: CCC TAIR''' - this is in bold, because we need to confirm this is the right category for dictyBase | ||
+ | ##TAIR: CCC TAIR | ||
+ | #Textpresso sentence (marked up version) | ||
+ | |||
+ | Example sentence: | ||
+ | |||
+ | SSC:6<TAB>PMID:12345678:Abstract:17:dicty_20130411<TAB>DPY-27,DPY-30<TAB>chromosome, chromosomes, nuclear<TAB><The marked-up Textpresso sentence that doesn't have any tabs> | ||
+ | |||
+ | Web page on [http://www.ebi.ac.uk/seqdb/confluence/display/GOAP/Protein2GO+Web+Services Protein2GO web services] | ||
+ | |||
+ | Back to [[CCC Form 2.0 Specifications]] |
Latest revision as of 17:08, 16 April 2013
Specifications for Textpresso for CCC Source Files
- The source files can be simplified a bit, but will retain the key information we need for curation and search and retrieval functions for the curation form.
- The format will continue to be a tab-delimited file containing, in order:
- SSC (stands for Textpresso Sentence SCore):Numerical Textpresso sentence score value
- Paper database code:numerical identifier:Paper section:Sentence ID in Textpresso document:corpus_date_of_search
- For WormBase and dictyBase, who are sending annotations to the Protein2GO tool via web services, this identifier needs to be either a PubMed ID or a doi.
- If neither a PubMed ID nor a doi exists, then the annotation cannot be sent to Protein2GO
- For TAIR, who is not yet using Protein2GO, it can still be the TAIR document ID
- Gene product name or synonym as identified by Textpresso search (names below are as seen on Textpresso web sites)
- WormBase: protein (C. elegans)
- dictyBase: dicty gene
- TAIR: gene (arabidopsis)
- Textpresso component category match (names below are as seen on Textpresso web sites)
- WormBase: CCC cellular component 2011-02-11
- dictyBase: CCC TAIR - this is in bold, because we need to confirm this is the right category for dictyBase
- TAIR: CCC TAIR
- Textpresso sentence (marked up version)
Example sentence:
SSC:6<TAB>PMID:12345678:Abstract:17:dicty_20130411<TAB>DPY-27,DPY-30<TAB>chromosome, chromosomes, nuclear<TAB><The marked-up Textpresso sentence that doesn't have any tabs>
Web page on Protein2GO web services
Back to CCC Form 2.0 Specifications