June 2012 - Weekly Pipeline Set-up
From WormBaseWiki
Jump to navigationJump to searchBack to DictyBase
Goal: To set up a weekly CCC pipeline for dictyBase.
- Proposed workflow from Petra
- On a weekly basis:
- Monday - dictyBase will identify PDFs for curation
- Tuesday - Textpresso will process new PDFs, perform CCC search, and send source file output to CCC curation form on tazendra
- CCC curation performed by dictyBase curators will be stored on tazendra and exported to protein2go database at EBI (UniProt-GOA)
- We will use four Textpresso categories for the searches
- dicty gene
- CCC TAIR
- CCC assay terms
- CCC verbs
- For now, the source file format will be the same as what we used for the dicty searches for the BioCreative task.
- However, we should consider naming the file by the date the search was performed so curators see that on the curation form.
- For an example, please see: dicty curation form
- On a weekly basis:
- Outstanding issues
- dicty gene names
- known issues: 'myosin' vs 'myosin IE - Petra's suggestion: remove 'myosin' from the gene list
- Greek characters in gene names - can this be addressed at all?
- Textpresso web sites for dicty
- currently, two web sites: ccc_for_dicty and Textpresso for dicty
- we should choose one name and stick with it; I would propose we go with www.textpresso.org/dicty, to keep it simple
- ccc_for_dicty site is currently running the most updated Textpresso software, but I cannot fine the dicty gene or dicty phenotype categories on this site right now
- we need to clean up the categories on the site we're going to use - make sure we have the correct dicty categories and remove some of the C. elegans-specific categories (for example, WormBase C. elegans phenotype is the second category listed under Biological Concepts while a dicty curator has to search down the list for any dicty-related categories)
- currently, two web sites: ccc_for_dicty and Textpresso for dicty
- CCC curation form
- We will need a better interface for source file display - perhaps a scrolling menu like the category menu on the Textpresso site.
- We could sort by year, month, week, etc.
- Output to protein2go tool
- What file format? GPAD or GAF?
- Will need to consult with Tony and Rachael on this.
- dicty gene names