gpi File

We will need to create a tab-delimited gpi file with each WormBase release.

Specifications Source

These specifications are based, in part, on the documentation on the GOC's go-annotation github repository:

https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-1_2.md

and also on the content of files submitted here:

http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/gpad-gpi/submission/

File Name

wb_nematode.gpi.gz (all nematode species in WB)

Header

Header content:

!gpi-version: 1.2

!Project_name: WormBase

!!WB_release: WS256

!Contact Email: help@wormbase.org

!URL: http://www.wormbase.org

!Date: 20161006

Field Values

column	name	required?	cardinality	GAF column	Example for WormBase	Tag in AceDB ?Gene model or Column in Reference Protein Source File
01	DB	required	1	1	WB	n/a
02	DB_Object_ID	required	1	2/17	WBGene00000035	Genes_elegans, Column 1 of TM output	n/a	n/a
03	DB_Object_Symbol	required	1	3	ace-1	CGC_name, Column 2 of TM output; if no CGC_name then Sequence_name, Column 3 of TM output	n/a	n/a
04	DB_Object_Name	optional	0 or 1	10	n/a	n/a	n/a	n/a
05	DB_Object_Synonym(s)	optional	0 or greater	11	If CGC_name exists, then Sequence_name, Column 3 of TM output, AND Other_name, Column 4 of TM output; If no CGC_name, but Sequence_name, then Other_name, Column 4 of TM output	For all, also check Column 4 of xrefs.txt, if entry contains a number AND lower case letter after the first '.', strip number after second '.' if the latter exists, and add all resulting unique values (if no number AND lower case letter after the first '.', then we can skip this column in the xrefs file); for all, also check Column 5 of xrefs.txt, if value exists, add unique values prefaced with 'WP:'; if no values, then skip	n/a
06	DB_Object_Type	required	1	12	gene	n/a	n/a	gene
07	Taxon	required	1	13	taxon:6239	n/a	n/a	taxon:6239
08	Parent_Object_ID	optional	0 or 1	-	WB:WBGene00000035	WBGene ID from Column 1 of TM output prefaced with 'WB:'	n/a	n/a
09	DB_Xref(s)	optional	0 or greater	-	-	UniProtKB:P38433	n/a	for all, if value exists, add unique values from xrefs.txt as follows: Column 7 of xrefs.txt prefaced with 'CCD:' and Column 8 of xrefs.txt prefaced with "UniProtKB:"; if no values, then skip	n/a
10	Gene_Product_Properties	optional	0 or greater	-	See Note 4 below	n/a	n/a	n/a	n/a

MGI

These specifications are based on the documentation on the GO wiki:

http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format#Final_format_.2809_Jan_2013.29_2

The input file is a comma-separated text file converted from an Excel spreadsheet sent to us by MGI.

The file is called: mgi_gpi_input.txt (the file has the carriage return symbols - need to remove?)

and is located on mangolassi here: /home/acedb/kimberly/ccc/ccc_gpi/mgi/input_files

There are duplicate entries in the mgi_gpi_input file that we will need to remove before we create the gpi file.

column	name	required?	cardinality	GAF column	Example for MGI	Column in mgi_gpi_input	Value if not in mgi_gpi_input (i.e. fixed value)
01	DB_Object_ID	required	1	2/17	1861229	Column 1, stripped of 'MGI:' prefix	n/a
02	DB_Object_Symbol	required	1	3	Adam21	Column 2, no changes	n/a
03	DB_Object_Name	optional	0 or 1	10	a disintegrin and metallopeptidase domain 21	Column 4, no changes	n/a
04	DB_Object_Synonym(s)	optional	0 or greater	11	n/a	n/a	n/a
05	DB_Object_Type	required	1	12	n/a	n/a	gene
06	Taxon	required	1	13	n/a	n/a	taxon:10090
07	Parent_Object_ID	optional	0 or 1	-	MGI:1861229	Column 1, no changes	n/a
08	DB_Xref(s)	optional	0 or greater	-	UniProtKB:Q9JI76	Column 3, add 'UniProtKB:' as prefix (note there are some entries with no value in Column 3)	n/a
09	Gene_Product_Properties	optional	0 or greater	-	n/a	n/a	n/a

Specifications for WB gpi file

Contents

gpi File

Specifications Source

File Name

Header

Field Values

MGI

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools