Difference between revisions of "Specifications for source files"
From WormBaseWiki
Jump to navigationJump to searchLine 21: | Line 21: | ||
#Textpresso component category match (names below are as seen on Textpresso web sites) | #Textpresso component category match (names below are as seen on Textpresso web sites) | ||
##WormBase: CCC cellular component 2011-02-11 | ##WormBase: CCC cellular component 2011-02-11 | ||
− | ##'''dictyBase: CCC TAIR''' | + | ##'''dictyBase: CCC TAIR''' - this is in bold, because we need to confirm this is the right category for dictyBase |
##TAIR: CCC TAIR | ##TAIR: CCC TAIR | ||
#Textpresso sentence (marked up version) | #Textpresso sentence (marked up version) | ||
+ | |||
+ | Example sentence: | ||
+ | |||
+ | 1 SSC 6 PID PMID:12345678 SID 17 GENE1 nuclear The marked-up Textpresso sentence that says GENE1 shows nuclear localization. | ||
Back to [[CCC Form 2.0 Specifications]] | Back to [[CCC Form 2.0 Specifications]] |
Revision as of 17:24, 26 March 2013
Specifications for Textpresso for CCC Source Files
- The source files can be simplified a bit, but will retain the key information we need for curation and search and retrieval functions for the curation form.
- The format will continue to be a tab-delimited file containing, in order:
- Sentence number in source file; starting with 1 and ending with whatever total number of sentences are in the file
- SSC (stands for Textpresso Sentence SCore)
- Numerical Textpresso sentence score value
- PID (stands for Paper IDentifier)
- Database code:numerical identifier
- For WormBase and dictyBase, who are sending annotations to the Protein2GO tool via web services, this identifier needs to be either a PubMed ID or a doi
- If neither a PubMed ID nor a doi exists, then the annotation cannot be sent to Protein2GO
- For TAIR, who is not yet using Protein2GO, it can still be the TAIR document ID
- SID (stands for Textpresso Sentence IDentifier)
- Numerical value of the sentence number in the document in Textpresso
- Gene product name or synonym as identified by Textpresso search (names below are as seen on Textpresso web sites)
- WormBase: protein (C. elegans)
- dictyBase: dicty gene
- TAIR: gene (arabidopsis)
- Textpresso component category match (names below are as seen on Textpresso web sites)
- WormBase: CCC cellular component 2011-02-11
- dictyBase: CCC TAIR - this is in bold, because we need to confirm this is the right category for dictyBase
- TAIR: CCC TAIR
- Textpresso sentence (marked up version)
Example sentence:
1 SSC 6 PID PMID:12345678 SID 17 GENE1 nuclear The marked-up Textpresso sentence that says GENE1 shows nuclear localization.
Back to CCC Form 2.0 Specifications