Difference between revisions of "CCC Form 2.0 Specifications"

From WormBaseWiki
Jump to navigationJump to search
Line 34: Line 34:
 
#Sentence classification
 
#Sentence classification
 
#Curator
 
#Curator
 +
#Annotation date
 +
#Annotation history
 
*Curator login
 
*Curator login
 
*Import of search results files
 
*Import of search results files

Revision as of 20:39, 7 November 2012

This page is intended to document specifications for the next version of the Textpresso for Cellular Component Curation (CCC) tool. The changes to the tool, and pipeline, have been suggested by curators and are also part of the broader plan for Textpresso-based curation pipelines and the GO's Common Annotation Framework.

Tool Features

Textpresso search specifications

  • Frequency
  • Corpus
  • Categories
  • Filtering (Textpresso)
  1. Journal
  2. Date
  3. Document IDs
  • Filtering (non-Textpresso)
  1. SVM
  2. Gene Ontology Gene Association File
  • Ranking search results
  • Naming search results file
  • Storing search histories
  1. Recording versions of pdf2text conversion
  2. Recording version of categories used
  3. Recording search criteria, i.e. categories, corpus, filters
  4. Recording curator and date of search

Curation form

  • What data to store for each annotation:
  1. Name of search results file
  2. Paper identifier
  3. Gene/gene product identifier
  4. Textpresso component term
  5. GO component term
  6. Evidence code
  7. Sentence
  8. Sentence classification
  9. Curator
  10. Annotation date
  11. Annotation history
  • Curator login
  • Import of search results files
  1. Can this be automated?
  • Organization of search results file
  1. If we have many results files, can we organize them in neater way than just one long list? See the Textpresso categories for an example of cascading menus, one possible solution.
  • Selection of search results file for curation
  • Display of paper bibliographic information
  1. This refers to adding more information than what we currently display, as well as how we display it. I think we could get the additional information from Textpresso and then just pretty up the display by adding some spacing and some bold text, etc. See PubMed for one possible example: [1]
  • Search functionality on form - this includes some new features
  1. Gene - search for all annotations/sentences that mention a gene, all sentences from which an annotation was made, only sentences from which no annotation was made
  2. Paper
  3. Curator
  4. Annotation date
  5. Component term in sentence
  6. GO term used for annotation
  • Curation when all entities are recognized
  • Curation when one or more entities is not recognized, needs to be added
  • Feedback from form to Textpresso
  1. Enter a new gene name and identifier
  2. Enter a new component term in sentence, add to Textpresso cellular component category
  • Evidence codes
  1. IDA (default), IPI (complex membership)
  • Edit a previous annotation
  1. Change gene annotated, change component term used, change GO term assigned, change evidence code
  • Edit relationship index
  1. This would be a separate functionality, but would allow a curator to view and edit the relationship index if needed.
  • Delete a search results file
  1. This could be tricky. We'd need to make sure there are no annotations associated with that search file.
  • Export annotations
  1. To a MOD
  2. To Protein2GO
  3. As a file - GO Gene Association File (GAF)

Files needed

  1. Mapping file for gene names and synonyms to MOD identifier and UniProtKB identifier