Difference between revisions of "Specifications for source files"

From WormBaseWiki
Jump to navigationJump to search
 
(13 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
*The format will continue to be a tab-delimited file containing, in order:
 
*The format will continue to be a tab-delimited file containing, in order:
  
#Sentence number in source file; starting with 1 and ending with whatever total number of sentences are in the file
+
#SSC (stands for Textpresso Sentence SCore):Numerical Textpresso sentence score value
#SSC (stands for Textpresso Sentence SCore)
+
#Paper database code:numerical identifier:Paper section:Sentence ID in Textpresso document:corpus_date_of_search
#Numerical Textpresso sentence score value
+
##For WormBase and dictyBase, who are sending annotations to the Protein2GO tool via web services, this identifier needs to be either a PubMed ID or a doi.
#PID (stands for Paper IDentifier)
 
#Database code:numerical identifier
 
##For WormBase and dictyBase, who are sending annotations to the Protein2GO tool via web services, this identifier needs to be either a PubMed ID or a doi
 
 
##If neither a PubMed ID nor a doi exists, then the annotation cannot be sent to Protein2GO
 
##If neither a PubMed ID nor a doi exists, then the annotation cannot be sent to Protein2GO
 
##For TAIR, who is not yet using Protein2GO, it can still be the TAIR document ID
 
##For TAIR, who is not yet using Protein2GO, it can still be the TAIR document ID
#SID (stands for Textpresso Sentence IDentifier)
+
#Gene product name or synonym as identified by Textpresso search (names below are as seen on Textpresso web sites)
#Numerical value of the sentence number in the document in Textpresso
 
#Gene product name or synonym as identified by Textpresso search
 
 
##WormBase: protein (C. elegans)
 
##WormBase: protein (C. elegans)
##dictyBase:
+
##dictyBase: dicty gene
 +
##TAIR: gene (arabidopsis)
 +
#Textpresso component category match (names below are as seen on Textpresso web sites)
 +
##WormBase: CCC cellular component 2011-02-11
 +
##'''dictyBase: CCC TAIR''' - this is in bold, because we need to confirm this is the right category for dictyBase
 +
##TAIR: CCC TAIR
 +
#Textpresso sentence (marked up version)
 +
 
 +
Example sentence:
 +
 
 +
SSC:6<TAB>PMID:12345678:Abstract:17:dicty_20130411<TAB>DPY-27,DPY-30<TAB>chromosome, chromosomes, nuclear<TAB><The marked-up Textpresso sentence that doesn't have any tabs>
 +
 
 +
Web page on [http://www.ebi.ac.uk/seqdb/confluence/display/GOAP/Protein2GO+Web+Services Protein2GO web services]
 +
 
 +
Back to [[CCC Form 2.0 Specifications]]

Latest revision as of 17:08, 16 April 2013

Specifications for Textpresso for CCC Source Files

  • The source files can be simplified a bit, but will retain the key information we need for curation and search and retrieval functions for the curation form.
  • The format will continue to be a tab-delimited file containing, in order:
  1. SSC (stands for Textpresso Sentence SCore):Numerical Textpresso sentence score value
  2. Paper database code:numerical identifier:Paper section:Sentence ID in Textpresso document:corpus_date_of_search
    1. For WormBase and dictyBase, who are sending annotations to the Protein2GO tool via web services, this identifier needs to be either a PubMed ID or a doi.
    2. If neither a PubMed ID nor a doi exists, then the annotation cannot be sent to Protein2GO
    3. For TAIR, who is not yet using Protein2GO, it can still be the TAIR document ID
  3. Gene product name or synonym as identified by Textpresso search (names below are as seen on Textpresso web sites)
    1. WormBase: protein (C. elegans)
    2. dictyBase: dicty gene
    3. TAIR: gene (arabidopsis)
  4. Textpresso component category match (names below are as seen on Textpresso web sites)
    1. WormBase: CCC cellular component 2011-02-11
    2. dictyBase: CCC TAIR - this is in bold, because we need to confirm this is the right category for dictyBase
    3. TAIR: CCC TAIR
  5. Textpresso sentence (marked up version)

Example sentence:

SSC:6<TAB>PMID:12345678:Abstract:17:dicty_20130411<TAB>DPY-27,DPY-30<TAB>chromosome, chromosomes, nuclear<TAB><The marked-up Textpresso sentence that doesn't have any tabs>

Web page on Protein2GO web services

Back to CCC Form 2.0 Specifications