Specifications for WB gpi file
gpi File
We will need to create a tab-delimited gpi file with each WormBase release.
Specifications Source
These specifications are based, in part, on the documentation on the GOC's go-annotation github repository:
https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-1_2.md
and also on the content of files submitted here:
http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/gpad-gpi/submission/
File Name
wb_nematode.gpi.gz (all nematode species in WB)
Header
Header content:
!gpi-version: 1.2
!Project_name: WormBase
!!WB_release: WS256
!Contact Email: help@wormbase.org
!URL: http://www.wormbase.org
!Date: 20161006
Field Values
column | name | required? | cardinality | GAF column | Example for WormBase (Gene OR Transcript OR Protein) | Tag in AceDB model | |||
---|---|---|---|---|---|---|---|---|---|
01 | DB | required | 1 | 1 | WB | n/a | |||
02 | DB_Object_ID | required | 1 | 2/17 | WBGene00006605 OR C15F1.3a OR WP:CE23546 | Gene OR Transcript OR Protein ID | |||
03 | DB_Object_Symbol | required | 1 | 3 | tra-2 OR tra-2 OR TRA-2 | Public_name in ?Gene model or capitalized version of Public_name in ?Gene model | |||
04 | DB_Object_Name | optional | 0 or 1 | 10 | n/a | n/a | |||
05 | DB_Object_Synonym(s) | optional | 0 or greater | 11 | Other_name in ?Gene model or capitalized version of Other_name in ?Gene model | ||||
06 | DB_Object_Type | required | 1 | 12 | gene | n/a | n/a | gene | |
07 | Taxon | required | 1 | 13 | taxon:6239 | n/a | n/a | taxon:6239 | |
08 | Parent_Object_ID | optional | 0 or 1 | - | WB:WBGene00000035 | WBGene ID from Column 1 of TM output prefaced with 'WB:' | n/a | n/a | |
09 | DB_Xref(s) | optional | 0 or greater | - | - | UniProtKB:P38433 | n/a | for all, if value exists, add unique values from xrefs.txt as follows: Column 7 of xrefs.txt prefaced with 'CCD:' and Column 8 of xrefs.txt prefaced with "UniProtKB:"; if no values, then skip | n/a |
10 | Gene_Product_Properties | optional | 0 or greater | - | See Note 4 below | n/a | n/a | n/a | n/a |
MGI
These specifications are based on the documentation on the GO wiki:
The input file is a comma-separated text file converted from an Excel spreadsheet sent to us by MGI.
The file is called: mgi_gpi_input.txt (the file has the carriage return symbols - need to remove?)
and is located on mangolassi here: /home/acedb/kimberly/ccc/ccc_gpi/mgi/input_files
There are duplicate entries in the mgi_gpi_input file that we will need to remove before we create the gpi file.
column | name | required? | cardinality | GAF column | Example for MGI | Column in mgi_gpi_input | Value if not in mgi_gpi_input (i.e. fixed value) |
---|---|---|---|---|---|---|---|
01 | DB_Object_ID | required | 1 | 2/17 | 1861229 | Column 1, stripped of 'MGI:' prefix | n/a |
02 | DB_Object_Symbol | required | 1 | 3 | Adam21 | Column 2, no changes | n/a |
03 | DB_Object_Name | optional | 0 or 1 | 10 | a disintegrin and metallopeptidase domain 21 | Column 4, no changes | n/a |
04 | DB_Object_Synonym(s) | optional | 0 or greater | 11 | n/a | n/a | n/a |
05 | DB_Object_Type | required | 1 | 12 | n/a | n/a | gene |
06 | Taxon | required | 1 | 13 | n/a | n/a | taxon:10090 |
07 | Parent_Object_ID | optional | 0 or 1 | - | MGI:1861229 | Column 1, no changes | n/a |
08 | DB_Xref(s) | optional | 0 or greater | - | UniProtKB:Q9JI76 | Column 3, add 'UniProtKB:' as prefix (note there are some entries with no value in Column 3) | n/a |
09 | Gene_Product_Properties | optional | 0 or greater | - | n/a | n/a | n/a |