GOC GPAD/GPI 2.0 Specifications
https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md
OA Annotations - this work is done
Mapping from gop_ postgres tables to GPAD 2.0 column
Skip all entries that have the value 'False Positive' in gop_falsepositive
gop_ postgres table name
|
GPAD 2.0 column
|
Action
|
Example
|
gop_wbgene |
1 |
Preface each value with 'WB:' |
WB:WBGene00006925
|
gop_qualifier |
2 |
This is the NEGATION field in the GPAD2.0 file, but no action needed as we don't have any negation in our GO OA annotations. |
n/a
|
gop_qualifier |
3 |
Map text value to Relations Ontology (RO) term id (see table below); populate with RO id. |
RO:0001025
|
gop_goid |
4 |
Add GO term id as it exists in table. |
GO:0051306
|
gop_accession |
5 |
Remove quotes and add id in table; we only have seven of these. |
GO_REF:0000015
|
gop_paper |
5 |
Add WBPaper ID and corresponding PMID, pipe separated. |
PMID:10978280|WB:WBPaper00004310
|
gop_goinference |
6 |
Map three-letter GO code to ECO code (see below); add ECO id. |
ECO:0000314
|
gop_with_wbgene |
7 |
Preface each value with 'WB:'; comma-separate multiple values |
WB:WBGene00000001
|
gop_with |
7 |
Add id as it exists in table; comma-separate multiple values |
FB:FBgn0003719
|
gop_with_phenotype |
7 |
Add id as it exists in table; comma-separate multiple values |
WBPhenotype:0000689
|
gop_with_rnai |
7 |
Preface each value with 'WB:'; comma-separate multiple values |
WB:WBRNAi00001974
|
gop_with_wbvariation |
7 |
Preface each value with 'WB:'; comma-separate multiple values |
WB:WBVar00242156
|
- |
8 |
No action; I don't think we have any values for an interacting taxon. |
|
gop_lastupdate |
9 |
If YYYY-MM-DD, add as exists in table. If YYYY-MM-DD HH:MM:SS convert to: YYYY-MM-DDTHH:MM |
2020-05-13 or 2006-02-03T12:26
|
no OA table |
10 |
Add WB |
WB
|
gop_xrefto |
11 |
Convert relation name to RO id, add value, directly and parenthetically, after RO id. |
RO:0002233(WB:WBGene00000584)
|
?? |
12 |
Add postgres annotation id, prefixed with 'id=WBOA:' |
id=WBOA:3565
|
gop_curator |
12 |
If available, map curator to ORCID and prefix with 'contributor-id=https://orcid.org/'. If no ORICD, add 'contributor-id=GOC:cab1' |
contributor-id=https://orcid.org/0000-0002-1478-7671 or contributor-id=GOC:cab1
|
gop_comment |
12 |
Add free text, prefixed with 'comment=' |
comment=2020-03-17; flagged FP prior to Noctua upload; no ISS With/From; more specific PAINT annotation exists.
|
gop_lastupdate |
12 |
Add creation-date=YYYY-MM-DD (or YYYY-MM-DDTHH:MM) |
creation-date=2021-06-29 (or creation-date=2021-07-15T16:52)
|
gop_lastupdate |
12 |
Add modification-date=YYYY-MM-DD (or YYYY-MM-DDTHH:MM) |
modification-date=2021-06-29 (or modification-date=2021-07-15T16:52)
|
Mapping gene product-to-term relation names to RO ids.
qualifier name (gop_qualifier)
|
RO ID
|
number of annotations in WS280 ace file (353 annotations total)
|
acts_upstream_of_or_within |
RO:0002264 |
9
|
located_in |
RO:0001025 |
10
|
involved_in |
RO:0002331 |
307
|
enables |
RO:0002327 |
18
|
part_of |
BFO:0000050 |
8
|
- Note: no instances of these gp2term relations in the OA:
- colocalizes_with (RO:0002325)
- contributes_to (RO:0002326)
- Note: found one annotation coming from the OA that lacked a gp2term relation; updated that for WS281
Mapping annotation extension relations to RO ids
relation label
|
RO ID
|
number of annotations in WS280 ace file (353 annotations total)
|
has_input |
RO:0002233 |
10
|
happens_during |
RO:0002092 |
4
|
occurs_in |
BFO:0000066 |
1
|
part_of |
BFO:0000050 |
8
|
Mapping three-letter GO codes to ECO ids.
three-letter GO code
|
ECO ID
|
ISS |
ECO:0000250
|
IEP |
ECO:0000270
|
NAS |
ECO:0000303
|
TAS |
ECO:0000304
|
IC |
ECO:0000305
|
ND |
ECO:0000307
|
IDA |
ECO:0000314
|
IMP |
ECO:0000315
|
IGI |
ECO:0000316
|
IPI |
ECO:0000353
|
Questions
- What about blank OA entries, e.g. pgid 14222?
- Blank entries were ignored.
Protein2GO Annotations
- Other mappings needed:
- PMID to WBPaper
- Relation text to gorel id
- Curators without orcids to GOC abbreviations
GPAD 2.0 column number
|
GPAD 2.0 column name
|
Action
|
UniProt Source File Example
|
WormBase Output File Example
|
Report on parsing failures
|
1 |
Annotated entity* |
Convert each UniProtKB: accession to a WBGene id using the latest WB gpi file |
UniProtKB:G5ED58 |
WB:WBGene00006925 |
Yes
|
2 |
Negation |
Leave as is |
NOT |
NOT |
No
|
3 |
Qualifier |
Leave as is |
RO:0002327 |
RO:0002327 |
No
|
4 |
GO term ID |
Leave as is |
GO:0051306 |
GO:0051306 |
No
|
5 |
Reference* |
Leave GO_REFs as is; map PMID or DOI to corresponding WBPaper id and add WBPaper id as a pipe-separated value |
PMID:10978280 |
PMID:10978280|WB:WBPaper00004310 |
Yes
|
6 |
Evidence |
Leave as is |
ECO:0000314 |
ECO:00000314 |
No
|
7 |
With/From* |
Leave as is, except for UniProtKB: accessions; for UniProtKB: accessions, try to map to a WBGene id using the latest WB gpi file; if UniProtKB: accession doesn't map to a WBGene id, then leave as is |
UniProtKB:D9PTP8 |
WB:WBGene00013354 |
Yes - output a list of UniProtKB accessions that didn't map to a WBGene
|
8 |
Interacting taxon |
Leave as is |
NCBITaxon:273526 |
NCBITaxon:273526 |
No
|
9 |
Annotation date |
Leave as is |
2006-02-03T12:26 |
2006-02-03T12:26 |
No
|
10 |
Assigned_by |
Leave as it |
WB |
WB |
No
|
11 |
Annotation extensions |
If relation is a text string, convert to an id according RO mapping table above. Otherwise, leave as is. |
RO:0002233(WB:WBGene00000584) |
RO:0002233(WB:WBGene00000584) |
Yes - report on any relation text strings that don't map to an ontology id.
|
12 |
Annotation properties* |
Most will stay as is, except for history (see below). |
contributor-id=https://orcid.org/0000-0002-1706-4196%7Ccomment=action:Updated by Kimberly Van Auken|model-state=???
|
Annotated Entity
- Use latest WormBase gpi file to map UniProtKB accessions (column 1 in incoming GPAD file; column 9 in WB gpi file) to corresponding WBGene ID (column 2 in WB gpi file)
- Latest WormBase gpi file: ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.canonical_bioproject.current_development.gene_product_info.gpi.gz
- For input values that include a '-digit', e.g. UniProtKB:P37806-1, strip the '-digit' for the purposes of mapping to a WBGene and add a comment in the Annotation properties field: comment=Original annotation made to whatever the full UniProtKB accession is. For example: comment=Original annotation made to UniProtKB:P37806-1.
- We have made some manual annotations to organisms other than C. elegans, so there will be some GPAD lines for which there is no mapping from the UniProtKB accession to a WBGene id.
References
- Map incoming doi or PMID to WBPaper id using the pap_identifier table.
With/From
Annotation Properties
- Contributor
- Most contributors are captured with an orcid.
- However, Carol and Josh do not have orcids, so we need to populate a GOC abberviation for their contributor id.
- If contributor-id is blank and comment=action:Added by Josh Jaffery [Expired account], then populate contributor-id=GOC:jja
- If contributor-id is blank and comment=action:Added by Carol Bastiani [Expired account], then populate contributor-id=GOC:cab1
- History
- Group annotations by id, e.g. id=GOA:2113472118
- If more than one line with same id, check corresponding date field (column 9) for each line
- For most recent date of grouped annotations, add creation-date=YYYY-MM-DD from earliest annotation, modification-date=YYYY-MM-DD from each subsequent date
Final File for Import
- The final files for import will be a file of OA annotations and a file of Protein2GO GPAD files
- We'll need to make the files available for Dustin to pick up somewhere for the import or for me to gzip and email him.