Difference between revisions of "TAIR CCC"
Line 58: | Line 58: | ||
− | + | Paths for categories on textpresso-dev.caltech.edu: | |
/data2/data-processing/data/arabidopsis/Data/ontology/lexica | /data2/data-processing/data/arabidopsis/Data/ontology/lexica | ||
Line 68: | Line 68: | ||
3) CCC verbs | 3) CCC verbs | ||
− | '''Juancarlos, can you fill in the exact path names here? --K''' | + | '''Juancarlos, can you fill in the exact path names we used here? --K''' |
− | + | Path name for arabidopsis genes on textpresso-dev.caltech.edu: | |
4) genes (arabidopsis) | 4) genes (arabidopsis) | ||
Line 76: | Line 76: | ||
/data2/data-processing/data/arabidopsis/Data/ontology/lexica/genes_arabidopsis.0-gram | /data2/data-processing/data/arabidopsis/Data/ontology/lexica/genes_arabidopsis.0-gram | ||
− | '''Juancarlos, can you confirm that this is the path we used for genes (arabidopsis)? --K''' | + | '''Juancarlos, can you confirm that this is the exact path we used for genes (arabidopsis)? --K''' |
Revision as of 17:02, 9 December 2010
Contents
Specifications for Curation Pipeline
Summary
This document is an outline of the Arabidopsis Textpresso for CCC pipeline for the initial trial run.
The trial run will be a search on all papers in the Textpresso for Arabidopsis corpus published in 2008.
Search results will be stored in three files:
1) all sentences returned by the search
2) sentences from papers already curated by TAIR for GO Cellular Component
3) sentences from papers not curated by TAIR for GO Cellular Component
Annotations can be made using an on-line curation form:
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/tair/tair_ccc.cgi
with two different outputs:
1) a three-column 'user submission' output
2) a standard GO Gene Association File (GAF) format
Pipeline Details
Paper Acquisition
- TAIR corpus averages ~2500 papers/year
- TAIR curator sends PDFs of papers to be included in the corpus to Textpresso (Michael) approximately every six months
- Any additional papers for which TAIR does not have a PDF are downloaded by Textpresso team, if possible
Textpresso Search
Categories
- Search Arabidopsis corpus on:
http://www.textpresso.org/arabidopsis/
Using these four categories:
1) CCC assay terms
2) CCC cellular components
3) CCC verbs
4) genes (arabidopsis)
Paths for categories on textpresso-dev.caltech.edu:
/data2/data-processing/data/arabidopsis/Data/ontology/lexica
1) CCC assay terms
2) CCC cellular component
3) CCC verbs
Juancarlos, can you fill in the exact path names we used here? --K
Path name for arabidopsis genes on textpresso-dev.caltech.edu:
4) genes (arabidopsis)
/data2/data-processing/data/arabidopsis/Data/ontology/lexica/genes_arabidopsis.0-gram
Juancarlos, can you confirm that this is the exact path we used for genes (arabidopsis)? --K
Back to Gene Ontology