Difference between revisions of "Specifications for WB gpi file"
From WormBaseWiki
Jump to navigationJump to searchLine 34: | Line 34: | ||
| 03 || DB_Object_Name || optional || 0 or 1 || 10 || Angiomotin || n/a || n/a || n/a | | 03 || DB_Object_Name || optional || 0 or 1 || 10 || Angiomotin || n/a || n/a || n/a | ||
|- | |- | ||
− | | 04 || DB_Object_Synonym(s) || optional || 0 or greater || 11 || AMOT_HUMAN|KIAA1071|AMOT || ACE1 || Other_name; if value in CGC_name, then also Sequence_name; for all, also take Molecular_name values, but first strip WP: prefix and any numbers after second '.' in transcript names to only take unique CE and transcript names (e.g., WP:CE21219 becomes CE21219 and T28F12.2a.1 becomes T28F12.2a) | + | | 04 || DB_Object_Synonym(s) || optional || 0 or greater || 11 || AMOT_HUMAN|KIAA1071|AMOT || ACE1 || Other_name; if value in CGC_name, then also Sequence_name; for all, also take Molecular_name values, but first strip WP: prefix and any numbers after second '.' in transcript names to only take unique CE and transcript names (e.g., WP:CE21219 becomes CE21219 and T28F12.2a.1 becomes T28F12.2a) || n/a || n/a |
|- | |- | ||
− | | 05 || DB_Object_Type || required || 1 || 12 || protein || gene || n/a || gene | + | | 05 || DB_Object_Type || required || 1 || 12 || protein || gene || n/a || n/a || gene |
|- | |- | ||
− | | 06 || Taxon || required || 1 || 13 || taxon:9606 || taxon:6239 || n/a || taxon:6239 | + | | 06 || Taxon || required || 1 || 13 || taxon:9606 || taxon:6239 || n/a || n/a || taxon:6239 |
|- | |- | ||
− | | 07 || Parent_Object_ID || optional || 0 or 1 || - || UniProtKB:Q4VCS5 || WB:WBGene00000035 || | + | | 07 || Parent_Object_ID || optional || 0 or 1 || - || UniProtKB:Q4VCS5 || WB:WBGene00000035 || WBGene ID prefaced with WB: || n/a || n/a |
|- | |- | ||
− | | 08 || DB_Xref(s) || optional || 0 or greater || - || - || UniProtKB:P38433 || 8, prefaced with UniProtKB: || n/a | + | | 08 || DB_Xref(s) || optional || 0 or greater || - || - || UniProtKB:P38433 || n/a || 8, prefaced with UniProtKB: || n/a |
|- | |- | ||
| 09 || Gene_Product_Properties || optional || 0 or greater || - || See Note 4 below || | | 09 || Gene_Product_Properties || optional || 0 or greater || - || See Note 4 below || | ||
|- | |- | ||
|} | |} |
Revision as of 19:40, 20 March 2013
These specifications are based on the documentation on the GO wiki:
We will need to create a new file with each WormBase release using the information in AceDB and the xrefs file generated for C. elegans that is available on the ftp site:
ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS236/species/c_elegans/
The file is named according to the release, e.g., c_elegans.WS236.xrefs.txt.gz
(Unfortunately there is no one AceDB object or file that has all of the information we need.)
Output will be sorted according to ascending WBGene ID.
column | name | required? | cardinality | GAF column | Example for UniProt | Example for WormBase | Tag in AceDB ?Gene model | Column in xrefs file | Value if not in AceDB ?Gene model or xrefs file |
---|---|---|---|---|---|---|---|---|---|
01 | DB_Object_ID | required | 1 | 2/17 | Q4VCS5-1 | WBGene00000035 | WBGene ID | n/a | n/a |
02 | DB_Object_Symbol | required | 1 | 3 | AMOT | ace-1 | CGC_name; if no CGC_name then Sequence_name | n/a | n/a |
03 | DB_Object_Name | optional | 0 or 1 | 10 | Angiomotin | n/a | n/a | n/a | |
04 | DB_Object_Synonym(s) | optional | 0 or greater | 11 | KIAA1071|AMOT | ACE1 | Other_name; if value in CGC_name, then also Sequence_name; for all, also take Molecular_name values, but first strip WP: prefix and any numbers after second '.' in transcript names to only take unique CE and transcript names (e.g., WP:CE21219 becomes CE21219 and T28F12.2a.1 becomes T28F12.2a) | n/a | n/a |
05 | DB_Object_Type | required | 1 | 12 | protein | gene | n/a | n/a | gene |
06 | Taxon | required | 1 | 13 | taxon:9606 | taxon:6239 | n/a | n/a | taxon:6239 |
07 | Parent_Object_ID | optional | 0 or 1 | - | UniProtKB:Q4VCS5 | WB:WBGene00000035 | WBGene ID prefaced with WB: | n/a | n/a |
08 | DB_Xref(s) | optional | 0 or greater | - | - | UniProtKB:P38433 | n/a | 8, prefaced with UniProtKB: | n/a |
09 | Gene_Product_Properties | optional | 0 or greater | - | See Note 4 below |