GPAD to .go file

From WormBaseWiki
Jump to navigationJump to search

Back to Gene Ontology

This page outlines how to convert the GPAD file format to .go

Note that we may want to just switch over to submitting gpad/gpi files for simplicity, if we can. My understanding is that the file specifications are still being finalized.

A table mapping gpad columns to .go:

gp_association files (GPAD)

N.B. The first line in the gp_association file should be;

!gpa-version: 1.1


Final format (09 Jan 2013)

column name required? cardinality old column # extra info .go file equivalent how to populate
1 DB required 1 1 must be in xrf_abbs
2 DB_Object_ID required 1 2 canonical or spliceform ID
3 Qualifier required 0 or greater 4 qualifiers to be confirmed
4 GO ID required 1 5 must be extant GO ID
5 DB:Reference(s) required 1 or greater 6 DB must be in xrf_abbs
6 Evidence code required 1 7 from ECO
7 With (or) From optional 0 or greater 8
8 Interacting taxon ID (for multi-organism processes) optional 0 or 1 13 NCBI taxon ID
9 Date required 1 14 YYYYMMDD
10 Assigned_by required 1 15 from xrf_abbs
11 Annotation Extension optional 0 or greater 16
12 Annotation Properties optional 0 or greater See Note 1 below

Notes

1. The Annotation Properties column can be filled with a pipe separated list of "property_name = property_value". There will be a fixed vocabulary for the property names and this list can be extended when necessary. The initial supported properties would be curator_name and annotation_identifier*, but can be extended to include e.g. curator_ID, modification_date, creation_date, annotation_notes...etc.

* curator_name and annotation_identifier will be useful for groups that are using Protein2GO for protein annotation who wish to maintain their annotations in their own database. These values can be used to keep track of individual annotations.

Further questions/discussion points

1. Qualifiers column. a. Are the explicit relations mandatory? b. If so, what are they.

2. Evidence column. a. Chain of evidence

3. Annotation properties column. Tony has suggested including the GO evidence code here to avoid using a lookup to reverse engineer the file