Specifications for WB gpi file

From WormBaseWiki
Jump to navigationJump to search

gpi File

We will need to create a tab-delimited gpi file with each WormBase release.

Specifications Source

These specifications are based, in part, on the documentation on the GOC's go-annotation github repository:

https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-1_2.md

and also on the content of files submitted here:

http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/gpad-gpi/submission/

File Name

wb_nematode.gpi.gz (all nematode species in WB)

Header

Header content:

!gpi-version: 1.2

!Project_name: WormBase

!!WB_release: WS256

!Contact Email: help@wormbase.org

!URL: http://www.wormbase.org

!Date: 20161006

Field Values

column name required? cardinality GAF column Example for WormBase (Gene OR Transcript OR Protein) Tag in AceDB model
01 DB required 1 1 WB n/a
02 DB_Object_ID required 1 2/17 WBGene00006605 OR C15F1.3a OR WP:CE23546 Gene OR Transcript OR Protein ID
03 DB_Object_Symbol required 1 3 tra-2 OR tra-2 OR TRA-2 Public_name in ?Gene model or capitalized version of Public_name in ?Gene model
04 DB_Object_Name optional 0 or 1 10 n/a n/a
05 DB_Object_Synonym(s) optional 0 or greater 11 Other_name in ?Gene model or capitalized version of Other_name in ?Gene model
06 DB_Object_Type required 1 12 gene n/a n/a gene
07 Taxon required 1 13 taxon:6239 n/a n/a taxon:6239
08 Parent_Object_ID optional 0 or 1 - WB:WBGene00000035 WBGene ID from Column 1 of TM output prefaced with 'WB:' n/a n/a
09 DB_Xref(s) optional 0 or greater - - UniProtKB:P38433 n/a for all, if value exists, add unique values from xrefs.txt as follows: Column 7 of xrefs.txt prefaced with 'CCD:' and Column 8 of xrefs.txt prefaced with "UniProtKB:"; if no values, then skip n/a
10 Gene_Product_Properties optional 0 or greater - See Note 4 below n/a n/a n/a n/a

MGI

These specifications are based on the documentation on the GO wiki:

http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format#Final_format_.2809_Jan_2013.29_2

The input file is a comma-separated text file converted from an Excel spreadsheet sent to us by MGI.

The file is called: mgi_gpi_input.txt (the file has the carriage return symbols - need to remove?)

and is located on mangolassi here: /home/acedb/kimberly/ccc/ccc_gpi/mgi/input_files

There are duplicate entries in the mgi_gpi_input file that we will need to remove before we create the gpi file.



column name required? cardinality GAF column Example for MGI Column in mgi_gpi_input Value if not in mgi_gpi_input (i.e. fixed value)
01 DB_Object_ID required 1 2/17 1861229 Column 1, stripped of 'MGI:' prefix n/a
02 DB_Object_Symbol required 1 3 Adam21 Column 2, no changes n/a
03 DB_Object_Name optional 0 or 1 10 a disintegrin and metallopeptidase domain 21 Column 4, no changes n/a
04 DB_Object_Synonym(s) optional 0 or greater 11 n/a n/a n/a
05 DB_Object_Type required 1 12 n/a n/a gene
06 Taxon required 1 13 n/a n/a taxon:10090
07 Parent_Object_ID optional 0 or 1 - MGI:1861229 Column 1, no changes n/a
08 DB_Xref(s) optional 0 or greater - UniProtKB:Q9JI76 Column 3, add 'UniProtKB:' as prefix (note there are some entries with no value in Column 3) n/a
09 Gene_Product_Properties optional 0 or greater - n/a n/a n/a