Specifications for WB gpi file
From WormBaseWiki
Jump to navigationJump to searchThese specifications are based on the documentation on the GO wiki:
We will need to create a new file with each WormBase release using the information in AceDB and the xrefs file generated for C. elegans that is available on the ftp site:
ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS236/species/c_elegans/
The file is named according to the release, e.g., c_elegans.WS236.xrefs.txt.gz
(Unfortunately there is no one AceDB object or file that has all of the information we need.)
Output will be sorted according to ascending WBGene ID.
column | name | required? | cardinality | GAF column | Example for UniProt | Example for WormBase | Tag in AceDB ?Gene model | Column in xrefs file | Value if not in AceDB ?Gene model or xrefs file |
---|---|---|---|---|---|---|---|---|---|
01 | DB_Object_ID | required | 1 | 2/17 | Q4VCS5-1 | WBGene00000035 | WBGene ID | n/a | n/a |
02 | DB_Object_Symbol | required | 1 | 3 | AMOT | ace-1 | CGC_name; if no CGC_name then Sequence_name | n/a | n/a |
03 | DB_Object_Name | optional | 0 or 1 | 10 | Angiomotin | n/a | n/a | n/a | |
04 | DB_Object_Synonym(s) | optional | 0 or greater | 11 | KIAA1071|AMOT | ACE1 | Other_name; if value in CGC_name, then also Sequence_name; for all, also take Molecular_name values, but first strip WP: prefix and any numbers after second '.' in transcript names to only take unique CE and transcript names (e.g., WP:CE21219 becomes CE21219 and T28F12.2a.1 becomes T28F12.2a) | ||
05 | DB_Object_Type | required | 1 | 12 | protein | gene | n/a | gene | |
06 | Taxon | required | 1 | 13 | taxon:9606 | taxon:6239 | n/a | taxon:6239 | |
07 | Parent_Object_ID | optional | 0 or 1 | - | UniProtKB:Q4VCS5 | WB:WBGene00000035 | 2, prefaced with WB: | n/a | |
08 | DB_Xref(s) | optional | 0 or greater | - | - | UniProtKB:P38433 | 8, prefaced with UniProtKB: | n/a | |
09 | Gene_Product_Properties | optional | 0 or greater | - | See Note 4 below |