Difference between revisions of "June 2012 - Weekly Pipeline Set-up"

From WormBaseWiki
Jump to navigationJump to search
Line 12: Line 12:
 
***CCC verbs
 
***CCC verbs
 
**For now, the source file format will be the same as what we used for the dicty searches for the BioCreative task.
 
**For now, the source file format will be the same as what we used for the dicty searches for the BioCreative task.
***For an example, please see:  [[http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/dicty/dicty_ccc.cgi]]
+
***For an example, please see:  [[http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/dicty/dicty_ccc.cgi dicty curation form]]
  
 
*Outstanding issues
 
*Outstanding issues

Revision as of 16:03, 26 June 2012

Goal: To set up a weekly CCC pipeline for dictyBase.

  • Proposed workflow from Petra
    • On a weekly basis:
      • Monday - dictyBase will identify PDFs for curation
      • Tuesday - Textpresso will process new PDFs, perform CCC search, and send source file output to CCC curation form on tazendra
      • CCC curation performed by dictyBase curators will be stored on tazendra and exported to protein2go database at EBI (UniProt-GOA)
    • We will use four Textpresso categories for the searches
      • dicty gene
      • CCC TAIR
      • CCC assay terms
      • CCC verbs
    • For now, the source file format will be the same as what we used for the dicty searches for the BioCreative task.
  • Outstanding issues
    • dicty gene names
      • known issues: 'myosin' vs 'myosin IE - Petra's suggestion: remove 'myosin' from the gene list
      • Greek characters in gene names - can this be addressed at all?