Difference between revisions of "Specifications for WB gpi file"
(→Header) |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 22: | Line 22: | ||
!gpi-version: 1.2 | !gpi-version: 1.2 | ||
+ | |||
+ | !Date: 2016-10-06 | ||
!Project_name: WormBase | !Project_name: WormBase | ||
− | ! | + | !Release: WS256 |
− | ! | + | !Contact_email: help@wormbase.org |
!URL: http://www.wormbase.org | !URL: http://www.wormbase.org | ||
− | |||
− | |||
==Field Values== | ==Field Values== | ||
Line 150: | Line 150: | ||
|coding_transcript | |coding_transcript | ||
|taxon:6239 | |taxon:6239 | ||
− | | | + | |WB:WBGene00000829 |
| | | | ||
| | | |
Latest revision as of 20:46, 10 October 2016
Contents
gpi File
We will need to create a tab-delimited gpi file with each WormBase release.
Specifications Source
These specifications are based, in part, on the documentation on the GOC's go-annotation github repository:
https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-1_2.md
and also on the content of files submitted here:
http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/gpad-gpi/submission/
File Name
wb_nematode.gpi.gz (all nematode species in WB)
Header
Header content:
!gpi-version: 1.2
!Date: 2016-10-06
!Project_name: WormBase
!Release: WS256
!Contact_email: help@wormbase.org
!URL: http://www.wormbase.org
Field Values
column | name | required? | cardinality | GAF column | Example for WormBase (Gene OR Transcript OR Protein) | Tag in AceDB model | Comment |
---|---|---|---|---|---|---|---|
01 | DB | required | 1 | 1 | WB | n/a | n/a |
02 | DB_Object_ID | required | 1 | 2/17 | WBGene00006796 OR F28F12.2a OR WP:CE21219 | Gene OR Transcript OR Protein ID | n/a |
03 | DB_Object_Symbol | required | 1 | 3 | unc-62 OR unc-62 OR UNC-62 | Public_name in ?Gene model or capitalized version of Public_name in ?Gene model | n/a |
04 | DB_Object_Name | optional | 0 or 1 | 10 | n/a | n/a | n/a |
05 | DB_Object_Synonym | optional | 0 or greater | 11 | Other_name in ?Gene model or capitalized version of Other_name in ?Gene model | ceh-25 OR ceh-25 OR CEH-25 (showing one, but we would include all Other_name entries) | This is a little different from what we currently put in the GAF, but I think the slightly different purpose of this file (Noctua, GO website searches, text mining) makes it beneficial to include the Other_name entries. When there are multiple entries, they should be pipe-separated. |
06 | DB_Object_Type | required | 1 | 12 | gene OR transcript OR protein | See Comment | For transcript, could we use the value in the Method tag? Do all of the values in this tag correspond to SO terms? It seems the CV of SO terms would be good to use here, if possible. |
07 | Taxon | required | 1 | 13 | taxon:6239 | n/a | NCBI taxonomy ID for corresponding species of entity in Column 2. |
08 | Parent_Object_ID | optional | 0 or 1 | n/a | WB:WBGene00006796 | Gene ID | The WB gene ID will be the parent ID for each transcript and protein entry. For gene entries, this field will be blank. |
09 | DB_Xref(s) | optional | 0 or greater | n/a | UniProtKB:Q9N5D6 (for gene) OR UniProtKB:Q9N5D6-1 (for protein) | For gene and transcript entries, see comment. | WBGene entries will contain a DB_Xref to the UniProtKB Reference Proteome accession, where available. The Reference Proteome accessions are available in this file: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/WORM/goa_worm.gpi.gz. Unfortunately, the goa_worm file does not currently xref to WBGene IDs.
Transcript entries will contain a DB_xref to RNAcentral accessions, where available. Protein entries will contain a DB_xref to the UniProtKB accession; this accession will be an isoform accession of a reviewed Swiss-Prot entry or a TrEMBL accession (e.g., Q9N5D6-1 or A0A0K3ASC5). |
10 | Gene_Product_Properties | optional | 0 or greater | n/a | n/a | n/a | Right now, I can't think of anything we'd need to put in this field, but that could change in future iterations. |
Example Entries
DB | DB_Object_ID | DB_Object_Symbol | DB_Object_Name | DB_Object_Synonym | DB_Object_Type | Taxon | Parent_Object_ID | DB_Xref(s) | Gene_Product_Properties | |
---|---|---|---|---|---|---|---|---|---|---|
WB | WBGene00006796 | unc-62 | let-318 pipe nob-5 pipe ceh-25 pipe CELE_T28F12.2 | gene | taxon:6239 | UniProtKB:Q9N5D6 | ||||
WB | T28F12.2a | unc-62 | let-318 pipe nob-5 pipe ceh-25 pipe CELE_T28F12.2 | coding_transcript | taxon:6239 | WB:WBGene00006796 | ||||
WB | WP:CE21219 | UNC-62 | LET-318 pipe NOB-5 pipe CEH-25 pipe CELE_T28F12.2 | protein | taxon:6239 | WB:WBGene00006796 | UniProtKB:Q9N5D6-1 | |||
WB | WP:CE50189 | UNC-62 | LET-318 pipe NOB-5 pipe CEH-25 pipe CELE_T28F12.2 | protein | taxon:6239 | WB:WBGene00006796 | UniProtKB:A0A0K3ASC5 | |||
WB | WBGene00000829 | ctb-1 | CYTB | gene | taxon:6239 | UniProtKB:P24890 | ||||
WB | MTCE.21 | ctb-1 | CYTB | coding_transcript | taxon:6239 | WB:WBGene00000829 | ||||
WB | WP:CE35348 | CTB-1 | CYTB | protein | taxon:6239 | WB:WBGene00000829 | ||||
WB | WBGene00002993 | lin-4 | CELE_F59G1.6 | gene | taxon:6239 | |||||
WB | F59G1.6 | lin-4 | CELE_F59G1.6 | pre_miRNA | taxon:6239 | WB:WBGene00002993 | RNAcentral:URS00001E2999 | |||
WB | F59G1.6a | lin-4 | CELE_F59G1.6 | miRNA | taxon:6239 | WB:WBGene00002993 | RNAcentral:URS0000278C03 |