Specifications for WB gpi file

From WormBaseWiki
Revision as of 19:24, 20 March 2013 by Vanaukenk (talk | contribs)
Jump to navigationJump to search

These specifications are based on the documentation on the GO wiki:

http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format#Final_format_.2809_Jan_2013.29_2

We will need to create a new file with each WormBase release using the information in AceDB and the xrefs file generated for C. elegans that is available on the ftp site:

ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS236/species/c_elegans/

The file is named according to the release, e.g., c_elegans.WS236.xrefs.txt.gz

(Unfortunately there is no one AceDB object or file that has all of the information we need.)

Output will be sorted according to ascending WBGene ID.


column name required? cardinality GAF column Example for UniProt Example for WormBase Tag in AceDB ?Gene model Column in xrefs file Value if not in AceDB ?Gene model or xrefs file
01 DB_Object_ID required 1 2/17 Q4VCS5-1 WBGene00000035 WBGene ID n/a n/a
02 DB_Object_Symbol required 1 3 AMOT ace-1 CGC_name; if no CGC_name then Sequence_name
03 DB_Object_Name optional 0 or 1 10 Angiomotin n/a n/a
04 DB_Object_Synonym(s) optional 0 or greater 11 KIAA1071|AMOT ACE1
05 DB_Object_Type required 1 12 protein gene n/a gene
06 Taxon required 1 13 taxon:9606 taxon:6239 n/a taxon:6239
07 Parent_Object_ID optional 0 or 1 - UniProtKB:Q4VCS5 WB:WBGene00000035 2, prefaced with WB: n/a
08 DB_Xref(s) optional 0 or greater - - UniProtKB:P38433 8, prefaced with UniProtKB: n/a
09 Gene_Product_Properties optional 0 or greater - See Note 4 below