Difference between revisions of "Automated descriptions for C. briggsae"

From WormBaseWiki
Jump to navigationJump to search
Line 26: Line 26:
  
 
*Process:
 
*Process:
**All rows with column 15 (assigned by) with 'WB' are WormBase annotations, those with 'UniProtKB' or 'InterPro' are from those databases
 
 
**Need data from these rows:
 
**Need data from these rows:
 
***where column 9: has value 'P' (Process),  
 
***where column 9: has value 'P' (Process),  
***column 2 (DB_Object ID): the associated genes, i.e WBGene ID,
+
***column 2 (DB_Object ID): i.e WBGene00000307
 
***column 3 (DB_Object symbol), i.e, Cbr-bli-4
 
***column 3 (DB_Object symbol), i.e, Cbr-bli-4
***column 5: GOID, eg, GO:0000346
+
***column 5: GOID, eg, GO:0006508
***column 6: DB:Reference (Reference), eg.PMID:12062106
+
***column 6: DB:Reference (Reference), eg.PMID:12062106, take all references that are pipe-separated
 
***column 7: Evidence code, i.e, IEA
 
***column 7: Evidence code, i.e, IEA
 
***column 8: With, eg. INTERPRO:IPR000209
 
***column 8: With, eg. INTERPRO:IPR000209
Line 39: Line 38:
 
**Need data from these rows:
 
**Need data from these rows:
 
*** where column 9 has value 'F' (Molecular Function)
 
*** where column 9 has value 'F' (Molecular Function)
***column 2, associated genes, has U
+
***column 2: (DB_Object ID), eg., WBGene00000307
***column 3: DB_Object symbol, eg, wht-7,  
+
***column 3: DB_Object symbol, eg., Cbr-bli-4
***column 5: GOID, eg, GO:0000346
+
***column 5: GOID, eg, GO:0004252
***column 6: DB:Reference (Reference), eg.PMID:12062106, GO_REF:0000002
+
***column 6: DB:Reference (Reference), eg.PMID:12520011, take all references that are pipe-separated
***column 7: Evidence code, eg, IMP
+
***column 7: Evidence code, eg, IEA
***column 8: 'With (or) From' eg., INTERPRO:IPR002293,
+
***column 8: 'With (or) From' eg., INTERPRO:IPR000209
 +
 
 +
*Sub-cellular localization (cell component)
 +
**Need data from these rows:
 +
*** where column 9 has value 'C' (Cellular Component)
 +
***column 2: (DB_Object ID), eg., WBGene00000324 
 +
***column 3: DB_Object symbol, eg, Cbr-exp-2
 +
***column 5: GOID, eg, GO:0008076
 +
***column 6: DB:Reference (Reference), eg.PMID:12520011, take all references that are pipe-separated
 +
***column 7: Evidence code, eg, IEA
 +
***column 8: 'With (or) From eg., INTERPRO:IPR000209
 +
 
 +
==Template for a C. briggsae gene description==
 +
For the test phase, order of sentences:
 +
*Orthology
 +
*Process
 +
*Function/identity
 +
*Component

Revision as of 19:56, 30 October 2014

Location of project-related files on Textpresso

http://textpresso-dev.caltech.edu/concise_descriptions/

Location of the concise description files for C. elegans:

  • For viewing the latest dump:

http://tazendra.caltech.edu/~postgres/cgi-bin/data/concise_dump_new.ace

  • Script: /home/postgres/work/citace_upload/concise/dump_concise.pl
  • File location: /home/postgres/public_html/cgi-bin/data/concise_dump_new.ace

Semantic categories in a Concise Description for C. briggsae

1. Orthology/Similarity to C. elegans and human
2. Processes
3. Molecular Function
4. Sub-cellular localization (Cell component)

Source files for homology data

1. Orthologs file:

2. Best BlastP hits file: ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS245/species/c_briggsae/PRJNA1073/c_briggsae.PRJNA10731.WS245.best_blastp_hits.txt.gz

  • Contact: Michael Paulini

Source file for Process, Molecular function and Sub-cellular localization (cell component)data

  • Process:
    • Need data from these rows:
      • where column 9: has value 'P' (Process),
      • column 2 (DB_Object ID): i.e WBGene00000307
      • column 3 (DB_Object symbol), i.e, Cbr-bli-4
      • column 5: GOID, eg, GO:0006508
      • column 6: DB:Reference (Reference), eg.PMID:12062106, take all references that are pipe-separated
      • column 7: Evidence code, i.e, IEA
      • column 8: With, eg. INTERPRO:IPR000209
  • Molecular Function:
    • Need data from these rows:
      • where column 9 has value 'F' (Molecular Function)
      • column 2: (DB_Object ID), eg., WBGene00000307
      • column 3: DB_Object symbol, eg., Cbr-bli-4
      • column 5: GOID, eg, GO:0004252
      • column 6: DB:Reference (Reference), eg.PMID:12520011, take all references that are pipe-separated
      • column 7: Evidence code, eg, IEA
      • column 8: 'With (or) From' eg., INTERPRO:IPR000209
  • Sub-cellular localization (cell component)
    • Need data from these rows:
      • where column 9 has value 'C' (Cellular Component)
      • column 2: (DB_Object ID), eg., WBGene00000324
      • column 3: DB_Object symbol, eg, Cbr-exp-2
      • column 5: GOID, eg, GO:0008076
      • column 6: DB:Reference (Reference), eg.PMID:12520011, take all references that are pipe-separated
      • column 7: Evidence code, eg, IEA
      • column 8: 'With (or) From eg., INTERPRO:IPR000209

Template for a C. briggsae gene description

For the test phase, order of sentences:

  • Orthology
  • Process
  • Function/identity
  • Component