Specifications for WB gpi file

These specifications are based on the documentation on the GO wiki:

http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format#Final_format_.2809_Jan_2013.29_2

We will need to create a new file with each WormBase release using the information in AceDB and the xrefs file generated for C. elegans that is available on the ftp site:

ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS236/species/c_elegans/

The file is named according to the release, e.g., c_elegans.WS236.xrefs.txt.gz

(Unfortunately there is no one AceDB object or file that has all of the information we need.)

Output will be sorted according to ascending WBGene ID.

column	name	required?	cardinality	GAF column	Example for UniProt	Example for WormBase	Tag in AceDB ?Gene model	Column in xrefs file	Value if not in AceDB ?Gene model or xrefs file
01	DB_Object_ID	required	1	2/17	Q4VCS5-1	WBGene00000035	WBGene ID	n/a	n/a
02	DB_Object_Symbol	required	1	3	AMOT	ace-1	CGC_name; if no CGC_name then Sequence_name	n/a	n/a
03	DB_Object_Name	optional	0 or 1	10	Angiomotin	n/a	n/a	n/a
04	DB_Object_Synonym(s)	optional	0 or greater	11	KIAA1071\|AMOT	ACE1	Other_name; if value in CGC_name, then also Sequence_name; for all, also take Molecular_name values, but first strip WP: prefix and any numbers after second '.' in transcript names to only take unique CE and transcript names (e.g., WP:CE21219 becomes CE21219 and T28F12.2a.1 becomes T28F12.2a)
05	DB_Object_Type	required	1	12	protein	gene	n/a	gene
06	Taxon	required	1	13	taxon:9606	taxon:6239	n/a	taxon:6239
07	Parent_Object_ID	optional	0 or 1	-	UniProtKB:Q4VCS5	WB:WBGene00000035	2, prefaced with WB:	n/a
08	DB_Xref(s)	optional	0 or greater	-	-	UniProtKB:P38433	8, prefaced with UniProtKB:	n/a
09	Gene_Product_Properties	optional	0 or greater	-	See Note 4 below

Specifications for WB gpi file

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools