Difference between revisions of "Specifications for source files"

From WormBaseWiki
Jump to navigationJump to search
(Created page with ''''Specifications for Textpresso for CCC Source Files''' *The source files can be simplified a bit, but will retain the key information we need for curation and search and retri…')
 
Line 9: Line 9:
 
#Numerical Textpresso sentence score value
 
#Numerical Textpresso sentence score value
 
#PID (stands for paper identifier)
 
#PID (stands for paper identifier)
#Numerical identifier, this will vary for group
+
#Database code:numerical identifier
##For WormBase and dictyBase, who are sending annotations to the Protein2GO tool via web services, this identifier needs to be the PubMed ID
+
##For WormBase and dictyBase, who are sending annotations to the Protein2GO tool via web services, this identifier needs to be either a PubMed ID or a doi
##If a PubMed ID is not available, we will then need a doi instead to send curation to Protein2GO
 
 
##If neither a PubMed ID nor a doi exists, then the annotation cannot be sent to Protein2GO
 
##If neither a PubMed ID nor a doi exists, then the annotation cannot be sent to Protein2GO
 
##For TAIR, who is not yet using Protein2GO, it can still be the TAIR document ID
 
##For TAIR, who is not yet using Protein2GO, it can still be the TAIR document ID

Revision as of 13:49, 26 March 2013

Specifications for Textpresso for CCC Source Files

  • The source files can be simplified a bit, but will retain the key information we need for curation and search and retrieval functions for the curation form.
  • The format will continue to be a tab-delimited file containing, in order:
  1. Sentence number in source file; starting with 1 and ending with whatever total number of sentences are in the file
  2. SC (stands for Textpresso sentence score)
  3. Numerical Textpresso sentence score value
  4. PID (stands for paper identifier)
  5. Database code:numerical identifier
    1. For WormBase and dictyBase, who are sending annotations to the Protein2GO tool via web services, this identifier needs to be either a PubMed ID or a doi
    2. If neither a PubMed ID nor a doi exists, then the annotation cannot be sent to Protein2GO
    3. For TAIR, who is not yet using Protein2GO, it can still be the TAIR document ID