Difference between revisions of "Specifications for WB gpi file"
Line 49: | Line 49: | ||
| 01 || DB || required || 1 || 1 || WB || n/a | | 01 || DB || required || 1 || 1 || WB || n/a | ||
|- | |- | ||
− | | 02 || DB_Object_ID || required || 1 || 2/17 || | + | | 02 || DB_Object_ID || required || 1 || 2/17 || WBGene00006796 OR F28F12.2a OR WP:CE21219 || Gene OR Transcript OR Protein ID |
|- | |- | ||
− | | 03 || DB_Object_Symbol || required || 1 || 3 || | + | | 03 || DB_Object_Symbol || required || 1 || 3 || unc-62 OR unc-62 OR UNC-62 || Public_name in ?Gene model or capitalized version of Public_name in ?Gene model |
|- | |- | ||
| 04 || DB_Object_Name || optional || 0 or 1 || 10 || n/a || n/a | | 04 || DB_Object_Name || optional || 0 or 1 || 10 || n/a || n/a | ||
|- | |- | ||
− | | 05 || DB_Object_Synonym(s) || optional || 0 or greater || 11 || Other_name in ?Gene model or capitalized version of Other_name in ?Gene model | + | | 05 || DB_Object_Synonym(s) || optional || 0 or greater || 11 || Other_name in ?Gene model or capitalized version of Other_name in ?Gene model || ceh-25 OR ceh-25 OR CEH-25 (showing one, but we would include all Other_name entries) |
|- | |- | ||
− | | 06 || DB_Object_Type || required || 1 || 12 || gene || n/a || n/a || gene | + | | 06 || DB_Object_Type || required || 1 || 12 || gene OR transcript OR protein || n/a || n/a || gene |
|- | |- | ||
| 07 || Taxon || required || 1 || 13 || taxon:6239 || n/a || n/a || taxon:6239 | | 07 || Taxon || required || 1 || 13 || taxon:6239 || n/a || n/a || taxon:6239 |
Revision as of 15:30, 6 October 2016
gpi File
We will need to create a tab-delimited gpi file with each WormBase release.
Specifications Source
These specifications are based, in part, on the documentation on the GOC's go-annotation github repository:
https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-1_2.md
and also on the content of files submitted here:
http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/gpad-gpi/submission/
File Name
wb_nematode.gpi.gz (all nematode species in WB)
Header
Header content:
!gpi-version: 1.2
!Project_name: WormBase
!!WB_release: WS256
!Contact Email: help@wormbase.org
!URL: http://www.wormbase.org
!Date: 20161006
Field Values
column | name | required? | cardinality | GAF column | Example for WormBase (Gene OR Transcript OR Protein) | Tag in AceDB model | |||
---|---|---|---|---|---|---|---|---|---|
01 | DB | required | 1 | 1 | WB | n/a | |||
02 | DB_Object_ID | required | 1 | 2/17 | WBGene00006796 OR F28F12.2a OR WP:CE21219 | Gene OR Transcript OR Protein ID | |||
03 | DB_Object_Symbol | required | 1 | 3 | unc-62 OR unc-62 OR UNC-62 | Public_name in ?Gene model or capitalized version of Public_name in ?Gene model | |||
04 | DB_Object_Name | optional | 0 or 1 | 10 | n/a | n/a | |||
05 | DB_Object_Synonym(s) | optional | 0 or greater | 11 | Other_name in ?Gene model or capitalized version of Other_name in ?Gene model | ceh-25 OR ceh-25 OR CEH-25 (showing one, but we would include all Other_name entries) | |||
06 | DB_Object_Type | required | 1 | 12 | gene OR transcript OR protein | n/a | n/a | gene | |
07 | Taxon | required | 1 | 13 | taxon:6239 | n/a | n/a | taxon:6239 | |
08 | Parent_Object_ID | optional | 0 or 1 | - | WB:WBGene00000035 | WBGene ID from Column 1 of TM output prefaced with 'WB:' | n/a | n/a | |
09 | DB_Xref(s) | optional | 0 or greater | - | - | UniProtKB:P38433 | n/a | for all, if value exists, add unique values from xrefs.txt as follows: Column 7 of xrefs.txt prefaced with 'CCD:' and Column 8 of xrefs.txt prefaced with "UniProtKB:"; if no values, then skip | n/a |
10 | Gene_Product_Properties | optional | 0 or greater | - | See Note 4 below | n/a | n/a | n/a | n/a |
MGI
These specifications are based on the documentation on the GO wiki:
The input file is a comma-separated text file converted from an Excel spreadsheet sent to us by MGI.
The file is called: mgi_gpi_input.txt (the file has the carriage return symbols - need to remove?)
and is located on mangolassi here: /home/acedb/kimberly/ccc/ccc_gpi/mgi/input_files
There are duplicate entries in the mgi_gpi_input file that we will need to remove before we create the gpi file.
column | name | required? | cardinality | GAF column | Example for MGI | Column in mgi_gpi_input | Value if not in mgi_gpi_input (i.e. fixed value) |
---|---|---|---|---|---|---|---|
01 | DB_Object_ID | required | 1 | 2/17 | 1861229 | Column 1, stripped of 'MGI:' prefix | n/a |
02 | DB_Object_Symbol | required | 1 | 3 | Adam21 | Column 2, no changes | n/a |
03 | DB_Object_Name | optional | 0 or 1 | 10 | a disintegrin and metallopeptidase domain 21 | Column 4, no changes | n/a |
04 | DB_Object_Synonym(s) | optional | 0 or greater | 11 | n/a | n/a | n/a |
05 | DB_Object_Type | required | 1 | 12 | n/a | n/a | gene |
06 | Taxon | required | 1 | 13 | n/a | n/a | taxon:10090 |
07 | Parent_Object_ID | optional | 0 or 1 | - | MGI:1861229 | Column 1, no changes | n/a |
08 | DB_Xref(s) | optional | 0 or greater | - | UniProtKB:Q9JI76 | Column 3, add 'UniProtKB:' as prefix (note there are some entries with no value in Column 3) | n/a |
09 | Gene_Product_Properties | optional | 0 or greater | - | n/a | n/a | n/a |