Specifications for WB gpi file
These specifications are based on the documentation on the GO wiki:
We will need to create a new file with each WormBase release using the information in AceDB and the xrefs file generated for C. elegans that is available on the ftp site:
ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS236/species/c_elegans/
The file is named according to the release, e.g., c_elegans.WS236.xrefs.txt.gz
(Unfortunately there is no one AceDB object or file that has all of the information we need.)
Output will be sorted according to ascending WBGene ID.
For CCC curation form:
1) For each gene name identified in a Textpresso sentence, we will look in Column 2 and Column 4 of gpi file for an exact match.
2) The form will then display, in the left-most box where gene name information is displayed, the gene name mapped to the parent object id in Column 7 and the UniProtKB: ID in Column 8, e.g. ace-1:WB:WBGene00000035:UniProtKB:P38433.
column | name | required? | cardinality | GAF column | Example for UniProt | Example for WormBase | Tag in AceDB ?Gene model | Column in xrefs file | Value if not in AceDB ?Gene model or xrefs file |
---|---|---|---|---|---|---|---|---|---|
01 | DB_Object_ID | required | 1 | 2/17 | Q4VCS5-1 | WBGene00000035 | WBGene ID | n/a | n/a |
02 | DB_Object_Symbol | required | 1 | 3 | AMOT | ace-1 | CGC_name; if no CGC_name then Sequence_name | n/a | n/a |
03 | DB_Object_Name | optional | 0 or 1 | 10 | Angiomotin | n/a | n/a | n/a | |
04 | DB_Object_Synonym(s) | optional | 0 or greater | 11 | KIAA1071|AMOT | ACE1 | Other_name; if value in CGC_name, then also Sequence_name; also take Molecular_name values, but first strip WP: prefix and any numbers after second '.' in transcript names to only take unique CE and transcript names (e.g., WP:CE21219 becomes CE21219 and T28F12.2a.1 becomes T28F12.2a) | n/a | n/a |
05 | DB_Object_Type | required | 1 | 12 | protein | gene | n/a | n/a | gene |
06 | Taxon | required | 1 | 13 | taxon:9606 | taxon:6239 | n/a | n/a | taxon:6239 |
07 | Parent_Object_ID | optional | 0 or 1 | - | UniProtKB:Q4VCS5 | WB:WBGene00000035 | WBGene ID prefaced with WB: | n/a | n/a |
08 | DB_Xref(s) | optional | 0 or greater | - | - | UniProtKB:P38433 | n/a | 8, prefaced with UniProtKB: | n/a |
09 | Gene_Product_Properties | optional | 0 or greater | - | See Note 4 below |