Specifications for WB gpi file

These specifications are based on the documentation on the GO wiki:

http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format#Final_format_.2809_Jan_2013.29_2

We will need to create a tab-delimited file with each WormBase release using the information in AceDB and the xrefs file generated for C. elegans that is available on the ftp site:

ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS236/species/c_elegans/

The file is named according to the release, e.g., c_elegans.WS236.xrefs.txt.gz

(Unfortunately there is no one AceDB object or file that has all of the information we need.)

Output will be sorted according to ascending WBGene ID and will contain a two-line header:

!gpi-version: 1.1

!namespace: WB

For CCC curation form:

1) For each gene name identified in a Textpresso sentence, we will look in Column 2 and Column 4 of gpi file for an exact match.

2) The form will then display for each match, in the left-most box where gene name information is displayed, the gene name mapped to the parent object id in Column 7 and the UniProtKB: ID in Column 8, e.g. ace-1:WB:WBGene00000035:UniProtKB:P38433. -- We're making these mappings static at the moment we get the sentence from textpresso, dealing with changing mappings between gene names and DBIDs or UniProt IDs is beyond the scope of what we want to deal with. The sentence can get re-textpressoed in the future with a then-current mapping and re-curated (K&J) Always show all sentences from textpresso even if no genes map to a DBID and/nor a UniProt ID, if that means there are no genes at all in the gene select box, or there's curation to a gene not in the list, then add a comment to the sentence in curation, and add it to ptgo directly. (K&J)

3) Note that the xrefs file is not sorted in order of WBGene ID, but the gpi file should be.

4) Where it is stated that a column can have one or greater values, e.g. 'with', DB_Object_Synonym(s), DB_Xref(s), the values should be given as a pipe-separated list.

From Table Maker:

Col 1: Class Gene

Col 2: Status

Col 3: CGC_name

Col 4: Sequence_name

Col 5: Other_name

Col 6: Corresponding_pseudogene

Filter out genes that have Status = Dead or Status = Suppressed and filter out genes that have a value for Corresponding_pseudogene

Use Table Make file and xrefs.txt file for creating gpi file as outlined below.

column	name	required?	cardinality	GAF column	Example for UniProt	Example for WormBase	Tag in AceDB ?Gene model	Column in xrefs file	Value if not in AceDB ?Gene model or xrefs file
01	DB_Object_ID	required	1	2/17	Q4VCS5-1	WBGene00000035	WBGene ID	n/a	n/a
02	DB_Object_Symbol	required	1	3	AMOT	ace-1	n/a	CGC_name, if no CGC_name then Sequence_name	n/a
03	DB_Object_Name	optional	0 or 1	10	Angiomotin	n/a	n/a	n/a	n/a
04	DB_Object_Synonym(s)	optional	0 or greater	11	KIAA1071\|AMOT	ACE1	If CGC_name, then Sequence_name AND Other_name; If no CGC_name, but Sequence_name, then Other_name; for all, check Column 4 of xrefs.txt, strip number after second '.' and add resulting unique values; for all check Column 5 of xrefs.txt, add unique values prefaced with 'WP:'	n/a	n/a
05	DB_Object_Type	required	1	12	protein	gene	n/a	n/a	gene
06	Taxon	required	1	13	taxon:9606	taxon:6239	n/a	n/a	taxon:6239
07	Parent_Object_ID	optional	0 or 1	-	UniProtKB:Q4VCS5	WB:WBGene00000035	n/a	WBGene ID prefaced with 'WB:'	n/a
08	DB_Xref(s)	optional	0 or greater	-	-	UniProtKB:P38433	n/a	for each WBGene, add unique values from xrefs.txt as follows: Column 7 prefaced with 'CCD:' and Column 8 prefaced with "UniProtKB:"	n/a
09	Gene_Product_Properties	optional	0 or greater	-	See Note 4 below	n/a	n/a	n/a	n/a

Specifications for WB gpi file

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools