Difference between revisions of "GPAD to .ace file"
Line 12: | Line 12: | ||
A table mapping gpad columns to .ace: | A table mapping gpad columns to .ace: | ||
+ | |||
+ | ===gp_association files (GPAD)=== | ||
+ | |||
+ | <pre> | ||
+ | N.B. The first line in the gp_association file should be; | ||
+ | |||
+ | !gpa-version: 1.1 | ||
+ | </pre> | ||
+ | |||
+ | |||
+ | ====Final format (09 Jan 2013)==== | ||
+ | |||
+ | |||
+ | {| border=1 cell-padding=5 cell-spacing=10 | ||
+ | |- | ||
+ | ! column | ||
+ | ! name | ||
+ | ! required? | ||
+ | ! cardinality | ||
+ | ! old column # | ||
+ | ! extra info | ||
+ | |- | ||
+ | | 1 || DB || required || 1 || 1 || must be in xrf_abbs | ||
+ | |- | ||
+ | | 2 || DB_Object_ID || required || 1 || 2 || canonical or spliceform ID | ||
+ | |- | ||
+ | | 3 || Qualifier || required || 0 or greater || 4 || qualifiers to be confirmed | ||
+ | |- | ||
+ | | 4 || GO ID || required || 1 || 5 || must be extant GO ID | ||
+ | |- | ||
+ | | 5 || DB:Reference(s) || required || 1 or greater || 6 || DB must be in xrf_abbs | ||
+ | |- | ||
+ | | 6 || Evidence code || required || 1 || 7 || from ECO | ||
+ | |- | ||
+ | | 7 || With (or) From || optional || 0 or greater || 8 || | ||
+ | |- | ||
+ | | 8 || Interacting taxon ID (for multi-organism processes) || optional || 0 or 1 || 13 || NCBI taxon ID | ||
+ | |- | ||
+ | | 9 || Date || required || 1 || 14 || YYYYMMDD | ||
+ | |- | ||
+ | | 10 || Assigned_by || required || 1 || 15 || from xrf_abbs | ||
+ | |- | ||
+ | | 11 || Annotation Extension || optional || 0 or greater || 16 || | ||
+ | |- | ||
+ | | 12 || Annotation Properties || optional || 0 or greater || || See Note 1 below || | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | 1. The Annotation Properties column can be filled with a pipe separated list of "property_name = property_value". There will be a fixed vocabulary for the property names and this list can be extended when necessary. | ||
+ | The initial supported properties would be curator_name and annotation_identifier*, but can be extended to include e.g. curator_ID, modification_date, creation_date, annotation_notes...etc. | ||
+ | |||
+ | <nowiki>*</nowiki> curator_name and annotation_identifier will be useful for groups that are using Protein2GO for protein annotation who wish to maintain their annotations in their own database. These values can be used to keep track of individual annotations. | ||
+ | |||
+ | '''Further questions/discussion points''' | ||
+ | |||
+ | 1. Qualifiers column. | ||
+ | a. Are the explicit relations mandatory? | ||
+ | b. If so, what are they. | ||
+ | |||
+ | 2. Evidence column. | ||
+ | a. Chain of evidence | ||
+ | |||
+ | 3. Annotation properties column. | ||
+ | Tony has suggested including the GO evidence code here to avoid using a lookup to reverse engineer the file |
Revision as of 21:19, 7 February 2013
This page outlines how we'll go forward with converting the gpad file we get back from UniProt into a .ace file for upload to citace.
The final file specifications for the GPAD file are available here:
http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format
Things to confirm:
- We get back from UniProt all C. elegans annotations, regardless of annotation source (e.g., WormBase, UniProt, IntAct, etc).
- We get back all data in the with/from column (e.g., variations, RNAi experiments)
A table mapping gpad columns to .ace:
gp_association files (GPAD)
N.B. The first line in the gp_association file should be; !gpa-version: 1.1
Final format (09 Jan 2013)
column | name | required? | cardinality | old column # | extra info | |
---|---|---|---|---|---|---|
1 | DB | required | 1 | 1 | must be in xrf_abbs | |
2 | DB_Object_ID | required | 1 | 2 | canonical or spliceform ID | |
3 | Qualifier | required | 0 or greater | 4 | qualifiers to be confirmed | |
4 | GO ID | required | 1 | 5 | must be extant GO ID | |
5 | DB:Reference(s) | required | 1 or greater | 6 | DB must be in xrf_abbs | |
6 | Evidence code | required | 1 | 7 | from ECO | |
7 | With (or) From | optional | 0 or greater | 8 | ||
8 | Interacting taxon ID (for multi-organism processes) | optional | 0 or 1 | 13 | NCBI taxon ID | |
9 | Date | required | 1 | 14 | YYYYMMDD | |
10 | Assigned_by | required | 1 | 15 | from xrf_abbs | |
11 | Annotation Extension | optional | 0 or greater | 16 | ||
12 | Annotation Properties | optional | 0 or greater | See Note 1 below |
Notes
1. The Annotation Properties column can be filled with a pipe separated list of "property_name = property_value". There will be a fixed vocabulary for the property names and this list can be extended when necessary. The initial supported properties would be curator_name and annotation_identifier*, but can be extended to include e.g. curator_ID, modification_date, creation_date, annotation_notes...etc.
* curator_name and annotation_identifier will be useful for groups that are using Protein2GO for protein annotation who wish to maintain their annotations in their own database. These values can be used to keep track of individual annotations.
Further questions/discussion points
1. Qualifiers column. a. Are the explicit relations mandatory? b. If so, what are they.
2. Evidence column. a. Chain of evidence
3. Annotation properties column. Tony has suggested including the GO evidence code here to avoid using a lookup to reverse engineer the file