GPAD to .ace file

Back to Gene Ontology

This page outlines how we'll go forward with converting the gpad file we get back from UniProt into a .ace file for upload to citace.

The final file specifications for the GPAD file are available here:

http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format

Things to confirm:

We get back from UniProt all C. elegans annotations, regardless of annotation source (e.g., WormBase, UniProt, IntAct, etc).
We get back all data in the with/from column (e.g., variations, RNAi experiments)
How many duplicate annotations from different sources? How do we want to handle that - only take WB or have multiple Curator_confirmed values (I think I favor the latter).

A table mapping gpad columns to .ace:

gp_association files (GPAD)

N.B. The first line in the gp_association file should be;

!gpa-version: 1.1

Final format (09 Jan 2013)

column	name	required?	cardinality	old column #	extra info	.ace file equivalent	how to populate
1	DB	required	1	1	must be in xrf_abbs	n/a	n/a
2	DB_Object_ID	required	1	2	canonical or spliceform ID	Gene : "WBGene00000001"	We will need to map UniProtKB IDs to WormBase WBGene IDs, using the current gp2protein file or later, the gpi file. If this column contains a UniProtKB ID followed by a dash and then a number, we can ignore the dash and the number for mapping to a WormBase ID.
3	Qualifier	required	0 or greater	4	qualifiers to be confirmed	n/a	Skip lines that have qualifiers
4	GO ID	required	1	5	must be extant GO ID	GO_term "GO:0008340"	Can take directly from file
5	DB:Reference(s)	required	1 or greater	6	DB must be in xrf_abbs	Paper_evidence "WBPaper00005614"	We will need to map PMIDs or dois to WBPaper IDs
6	Evidence code	required	1	7	from ECO	"IMP"	This column is populated with ECO codes, but the corresponding three-letter GO evidence code is in Column 12 with the heading go_evidence=
7	With (or) From	optional	0 or greater	8		n/a	n/a
8	Interacting taxon ID (for multi-organism processes)	optional	0 or 1	13	NCBI taxon ID	n/a	n/a
9	Date	required	1	14	YYYYMMDD	Date_last_updated evidence	Can take directly from file
10	Assigned_by	required	1	15	from xrf_abbs	Curator_confirmed	If WormBase, take value from Annotation Properties preceded by curator_name=, if not WB, then we will need to create Person objects (other thoughts?) for other databases or projects, e.g. UniProt, IntAct, RefGenome
11	Annotation Extension	optional	0 or greater	16		n/a	n/a
12	Annotation Properties	optional	0 or greater		See Note 1 below	Curator_confirmed	Take this value only if Column 10 is populated with WormBase

Notes

1. The Annotation Properties column can be filled with a pipe separated list of "property_name = property_value". There will be a fixed vocabulary for the property names and this list can be extended when necessary. The initial supported properties would be curator_name and annotation_identifier*, but can be extended to include e.g. curator_ID, modification_date, creation_date, annotation_notes...etc.

* curator_name and annotation_identifier will be useful for groups that are using Protein2GO for protein annotation who wish to maintain their annotations in their own database. These values can be used to keep track of individual annotations.

Further questions/discussion points

1. Qualifiers column. a. Are the explicit relations mandatory? b. If so, what are they.

2. Evidence column. a. Chain of evidence

3. Annotation properties column. Tony has suggested including the GO evidence code here to avoid using a lookup to reverse engineer the file

GPAD to .ace file

gp_association files (GPAD)

Final format (09 Jan 2013)

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools