Difference between revisions of "Processing Gene and Protein Names for Searches and Curation"

From WormBaseWiki
Jump to navigationJump to search
(Created page with ''''General Issue''' Effective Textpresso searches, and subsequent curation, require the most complete lists of gene and protein names possible. MODs can supply a list of gene (…')
 
Line 16: Line 16:
  
 
*From processed list, create an expanded mappings file for the CCC curation form
 
*From processed list, create an expanded mappings file for the CCC curation form
 +
 +
 +
Back to [[Gene Ontology]]

Revision as of 17:34, 8 August 2011

General Issue

Effective Textpresso searches, and subsequent curation, require the most complete lists of gene and protein names possible.

MODs can supply a list of gene (and possibly protein) names they have in their database, mapped to database gene IDs, but variations on gene and protein names can still appear in the literature.

Common variations include changes in case and additions of prefixes to gene or protein names, e.g. At or Ce.

Steps for Gene and Protein Name Processing

  • Get a mappings file from the MOD or database that maps each database ID to a list of gene or protein names and synonyms
  • Process the list of gene or protein names and synonyms to include variations in case and, where needed, strip prefixes
  • From processed list, create a Textpresso category for searching
  • From processed list, create an expanded mappings file for the CCC curation form


Back to Gene Ontology