Difference between revisions of "TAIR"
Line 9: | Line 9: | ||
===Pipeline Specifics=== | ===Pipeline Specifics=== | ||
− | + | ====Paper Acquisition==== | |
*TAIR curator sends PDFs of Arabidopsis or other relevant papers to Textpresso (Michael) approximately every six months. | *TAIR curator sends PDFs of Arabidopsis or other relevant papers to Textpresso (Michael) approximately every six months. | ||
Line 15: | Line 15: | ||
*Other relevant papers without a PDF are downloaded by Textpresso team, if possible. | *Other relevant papers without a PDF are downloaded by Textpresso team, if possible. | ||
− | + | ====Textpresso Search for Cellular Component Annotations==== | |
*Search Arabidopsis corpus using four categories: | *Search Arabidopsis corpus using four categories: | ||
Line 31: | Line 31: | ||
1) 2008 | 1) 2008 | ||
− | + | ====Curation - Sentence Input and Annotation==== | |
*The output of the searches will be sentences sorted into two separate files. The criterion for sorting is whether any Cellular Component Annotations exist for that paper in TAIR's Gene Association file. | *The output of the searches will be sentences sorted into two separate files. The criterion for sorting is whether any Cellular Component Annotations exist for that paper in TAIR's Gene Association file. | ||
Line 39: | Line 39: | ||
*Results of searches will be presented in the curation form, in a three-column format: 1) gene or protein name, 2) cellular component category term, 3) if available, a suggested GO term from the existing category term-GO term relationship index. | *Results of searches will be presented in the curation form, in a three-column format: 1) gene or protein name, 2) cellular component category term, 3) if available, a suggested GO term from the existing category term-GO term relationship index. | ||
− | + | ====Curation - Output to Gene Association File==== | |
Curation will be output to in the form of a Gene Association file that can be picked up by the TAIR curators and added to the main Gene Association file that they submit to GO. | Curation will be output to in the form of a Gene Association file that can be picked up by the TAIR curators and added to the main Gene Association file that they submit to GO. |
Revision as of 15:30, 28 October 2010
Gene Ontology Curation at TAIR
Contents
Specifications for Curation Pipeline
Summary
This document is an outline of the Arabidopsis Textpresso for CCC pipeline. The initial trial run will be for all papers in the Textpresso for Arabidopsis corpus published in 2008. Results will be split into two files: results from papers already curated for GO Cellular Component and results from papers not curated for GO Cellular Component. Curation made using on on-line curation form will be output to a Gene Association file format for incorporation into the main TAIR file submitted to GO.
Pipeline Specifics
Paper Acquisition
- TAIR curator sends PDFs of Arabidopsis or other relevant papers to Textpresso (Michael) approximately every six months.
- Other relevant papers without a PDF are downloaded by Textpresso team, if possible.
Textpresso Search for Cellular Component Annotations
- Search Arabidopsis corpus using four categories:
1) CCC assay terms
2) CCC cellular component
3) CCC verbs
4) genes (arabidopsis)
- Filters
1) 2008
Curation - Sentence Input and Annotation
- The output of the searches will be sentences sorted into two separate files. The criterion for sorting is whether any Cellular Component Annotations exist for that paper in TAIR's Gene Association file.
- Each file will be available in the curation form for curators to assess search results.
- Results of searches will be presented in the curation form, in a three-column format: 1) gene or protein name, 2) cellular component category term, 3) if available, a suggested GO term from the existing category term-GO term relationship index.
Curation - Output to Gene Association File
Curation will be output to in the form of a Gene Association file that can be picked up by the TAIR curators and added to the main Gene Association file that they submit to GO.
Back to Gene Ontology