Revision as of 15:30, 6 October 2016

gpi File

We will need to create a tab-delimited gpi file with each WormBase release.

These specifications are based, in part, on the documentation on the GOC's go-annotation github repository:

and also on the content of files submitted here:

wb_nematode.gpi.gz (all nematode species in WB)

Header content:

!gpi-version: 1.2

!Project_name: WormBase

!!WB_release: WS256

!Contact Email: help@wormbase.org

!Date: 20161006

column	name	required?	cardinality	GAF column	Example for WormBase (Gene OR Transcript OR Protein)	Tag in AceDB model
01	DB	required	1	1	WB	n/a
02	DB_Object_ID	required	1	2/17	WBGene00006796 OR F28F12.2a OR WP:CE21219	Gene OR Transcript OR Protein ID
03	DB_Object_Symbol	required	1	3	unc-62 OR unc-62 OR UNC-62	Public_name in ?Gene model or capitalized version of Public_name in ?Gene model
04	DB_Object_Name	optional	0 or 1	10	n/a	n/a
05	DB_Object_Synonym(s)	optional	0 or greater	11	Other_name in ?Gene model or capitalized version of Other_name in ?Gene model	ceh-25 OR ceh-25 OR CEH-25 (showing one, but we would include all Other_name entries)
06	DB_Object_Type	required	1	12	gene OR transcript OR protein	n/a	n/a	gene
07	Taxon	required	1	13	taxon:6239	n/a	n/a	taxon:6239
08	Parent_Object_ID	optional	0 or 1	-	WB:WBGene00000035	WBGene ID from Column 1 of TM output prefaced with 'WB:'	n/a	n/a
09	DB_Xref(s)	optional	0 or greater	-	-	UniProtKB:P38433	n/a	for all, if value exists, add unique values from xrefs.txt as follows: Column 7 of xrefs.txt prefaced with 'CCD:' and Column 8 of xrefs.txt prefaced with "UniProtKB:"; if no values, then skip	n/a
10	Gene_Product_Properties	optional	0 or greater	-	See Note 4 below	n/a	n/a	n/a	n/a

These specifications are based on the documentation on the GO wiki:

The input file is a comma-separated text file converted from an Excel spreadsheet sent to us by MGI.

The file is called: mgi_gpi_input.txt (the file has the carriage return symbols - need to remove?)

and is located on mangolassi here: /home/acedb/kimberly/ccc/ccc_gpi/mgi/input_files

There are duplicate entries in the mgi_gpi_input file that we will need to remove before we create the gpi file.

column	name	required?	cardinality	GAF column	Example for MGI	Column in mgi_gpi_input	Value if not in mgi_gpi_input (i.e. fixed value)
01	DB_Object_ID	required	1	2/17	1861229	Column 1, stripped of 'MGI:' prefix	n/a
02	DB_Object_Symbol	required	1	3	Adam21	Column 2, no changes	n/a
03	DB_Object_Name	optional	0 or 1	10	a disintegrin and metallopeptidase domain 21	Column 4, no changes	n/a
04	DB_Object_Synonym(s)	optional	0 or greater	11	n/a	n/a	n/a
05	DB_Object_Type	required	1	12	n/a	n/a	gene
06	Taxon	required	1	13	n/a	n/a	taxon:10090
07	Parent_Object_ID	optional	0 or 1	-	MGI:1861229	Column 1, no changes	n/a
08	DB_Xref(s)	optional	0 or greater	-	UniProtKB:Q9JI76	Column 3, add 'UniProtKB:' as prefix (note there are some entries with no value in Column 3)	n/a
09	Gene_Product_Properties	optional	0 or greater	-	n/a	n/a	n/a

@@ Line 49: / Line 49: @@
 | 01 || DB || required || 1 || 1 || WB || n/a
 |-
-| 02 || DB_Object_ID || required || 1 || 2/17 || WBGene00006605 OR C15F1.3a OR WP:CE23546 || Gene OR Transcript OR Protein ID
+| 02 || DB_Object_ID || required || 1 || 2/17 || WBGene00006796 OR F28F12.2a OR WP:CE21219 || Gene OR Transcript OR Protein ID
 |-
-| 03 || DB_Object_Symbol || required || 1 || 3 || tra-2 OR tra-2 OR TRA-2 || Public_name in ?Gene model or capitalized version of Public_name in ?Gene model
+| 03 || DB_Object_Symbol || required || 1 || 3 || unc-62 OR unc-62 OR UNC-62 || Public_name in ?Gene model or capitalized version of Public_name in ?Gene model
 |-
 | 04 || DB_Object_Name || optional || 0 or 1 || 10 || n/a || n/a
 |-
-| 05 || DB_Object_Synonym(s) || optional || 0 or greater || 11 || Other_name in ?Gene model or capitalized version of Other_name in ?Gene model
+| 05 || DB_Object_Synonym(s) || optional || 0 or greater || 11 || Other_name in ?Gene model or capitalized version of Other_name in ?Gene model || ceh-25 OR ceh-25 OR CEH-25 (showing one, but we would include all Other_name entries)
 |-
-| 06 || DB_Object_Type || required || 1 || 12 || gene || n/a || n/a || gene
+| 06 || DB_Object_Type || required || 1 || 12 || gene OR transcript OR protein || n/a || n/a || gene
 |-
 | 07 || Taxon || required || 1 || 13 || taxon:6239 || n/a || n/a || taxon:6239