Difference between revisions of "June 2012 - Weekly Pipeline Set-up"

From WormBaseWiki
Jump to navigationJump to search
Line 6: Line 6:
 
***Tuesday - Textpresso will process new PDFs, perform CCC search, and send source file output to CCC curation form on tazendra
 
***Tuesday - Textpresso will process new PDFs, perform CCC search, and send source file output to CCC curation form on tazendra
 
***CCC curation performed by dictyBase curators will be stored on tazendra and exported to protein2go database at EBI (UniProt-GOA)
 
***CCC curation performed by dictyBase curators will be stored on tazendra and exported to protein2go database at EBI (UniProt-GOA)
 +
 +
*We will use four Textpresso categories for the searches
 +
**dicty gene
 +
**CCC TAIR
 +
***CCC assay terms
 +
***CCC verbs
 +
 +
*For now, the source file format will be the same as what we used for the dicty searches for the BioCreative task.
  
 
*Outstanding issues
 
*Outstanding issues
 
**dicty gene names  
 
**dicty gene names  
***known issues: 'myosin' vs 'myosin IE'
+
***known issues: 'myosin' vs 'myosin IE - Petra's suggestion: remove 'myosin' from the gene list
 +
***Greek characters in gene names - can this be addressed at all?

Revision as of 16:00, 26 June 2012

Goal: To set up a weekly CCC pipeline for dictyBase.

  • Proposed workflow from Petra
    • On a weekly basis:
      • Monday - dictyBase will identify PDFs for curation
      • Tuesday - Textpresso will process new PDFs, perform CCC search, and send source file output to CCC curation form on tazendra
      • CCC curation performed by dictyBase curators will be stored on tazendra and exported to protein2go database at EBI (UniProt-GOA)
  • We will use four Textpresso categories for the searches
    • dicty gene
    • CCC TAIR
      • CCC assay terms
      • CCC verbs
  • For now, the source file format will be the same as what we used for the dicty searches for the BioCreative task.
  • Outstanding issues
    • dicty gene names
      • known issues: 'myosin' vs 'myosin IE - Petra's suggestion: remove 'myosin' from the gene list
      • Greek characters in gene names - can this be addressed at all?