GOC GPAD/GPI 2.0 Specifications
https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md
OA Annotations - this work is done
Mapping from gop_ postgres tables to GPAD 2.0 column
Skip all entries that have the value 'False Positive' in gop_falsepositive
gop_ postgres table name
|
GPAD 2.0 column
|
Action
|
Example
|
gop_wbgene |
1 |
Preface each value with 'WB:' |
WB:WBGene00006925
|
gop_qualifier |
2 |
This is the NEGATION field in the GPAD2.0 file, but no action needed as we don't have any negation in our GO OA annotations. |
n/a
|
gop_qualifier |
3 |
Map text value to Relations Ontology (RO) term id (see table below); populate with RO id. |
RO:0001025
|
gop_goid |
4 |
Add GO term id as it exists in table. |
GO:0051306
|
gop_accession |
5 |
Remove quotes and add id in table; we only have seven of these. |
GO_REF:0000015
|
gop_paper |
5 |
Add WBPaper ID and corresponding PMID, pipe separated. |
PMID:10978280|WB:WBPaper00004310
|
gop_goinference |
6 |
Map three-letter GO code to ECO code (see below); add ECO id. |
ECO:0000314
|
gop_with_wbgene |
7 |
Preface each value with 'WB:'; comma-separate multiple values |
WB:WBGene00000001
|
gop_with |
7 |
Add id as it exists in table; comma-separate multiple values |
FB:FBgn0003719
|
gop_with_phenotype |
7 |
Add id as it exists in table; comma-separate multiple values |
WBPhenotype:0000689
|
gop_with_rnai |
7 |
Preface each value with 'WB:'; comma-separate multiple values |
WB:WBRNAi00001974
|
gop_with_wbvariation |
7 |
Preface each value with 'WB:'; comma-separate multiple values |
WB:WBVar00242156
|
- |
8 |
No action; I don't think we have any values for an interacting taxon. |
|
gop_lastupdate |
9 |
If YYYY-MM-DD, add as exists in table. If YYYY-MM-DD HH:MM:SS convert to: YYYY-MM-DDTHH:MM |
2020-05-13 or 2006-02-03T12:26
|
no OA table |
10 |
Add WB |
WB
|
gop_xrefto |
11 |
Convert relation name to RO id, add value, directly and parenthetically, after RO id. |
RO:0002233(WB:WBGene00000584)
|
?? |
12 |
Add postgres annotation id, prefixed with 'id=WBOA:' |
id=WBOA:3565
|
gop_curator |
12 |
If available, map curator to ORCID and prefix with 'contributor-id=https://orcid.org/'. If no ORICD, add 'contributor-id=GOC:cab1' |
contributor-id=https://orcid.org/0000-0002-1478-7671 or contributor-id=GOC:cab1
|
gop_comment |
12 |
Add free text, prefixed with 'comment=' |
comment=2020-03-17; flagged FP prior to Noctua upload; no ISS With/From; more specific PAINT annotation exists.
|
gop_lastupdate |
12 |
Add creation-date=YYYY-MM-DD (or YYYY-MM-DDTHH:MM) |
creation-date=2021-06-29 (or creation-date=2021-07-15T16:52)
|
gop_lastupdate |
12 |
Add modification-date=YYYY-MM-DD (or YYYY-MM-DDTHH:MM) |
modification-date=2021-06-29 (or modification-date=2021-07-15T16:52)
|
Mapping gene product-to-term relation names to RO ids.
qualifier name (gop_qualifier)
|
RO ID
|
number of annotations in WS280 ace file (353 annotations total)
|
acts_upstream_of_or_within |
RO:0002264 |
9
|
located_in |
RO:0001025 |
10
|
involved_in |
RO:0002331 |
307
|
enables |
RO:0002327 |
18
|
part_of |
BFO:0000050 |
8
|
- Note: no instances of these gp2term relations in the OA:
- colocalizes_with (RO:0002325)
- contributes_to (RO:0002326)
- Note: found one annotation coming from the OA that lacked a gp2term relation; updated that for WS281
Mapping annotation extension relations to RO ids
relation label
|
RO ID
|
number of annotations in WS280 ace file (353 annotations total)
|
has_input |
RO:0002233 |
10
|
happens_during |
RO:0002092 |
4
|
occurs_in |
BFO:0000066 |
1
|
part_of |
BFO:0000050 |
8
|
Mapping three-letter GO codes to ECO ids.
three-letter GO code
|
ECO ID
|
ISS |
ECO:0000250
|
IEP |
ECO:0000270
|
NAS |
ECO:0000303
|
TAS |
ECO:0000304
|
IC |
ECO:0000305
|
ND |
ECO:0000307
|
IDA |
ECO:0000314
|
IMP |
ECO:0000315
|
IGI |
ECO:0000316
|
IPI |
ECO:0000353
|
Questions
- What about blank OA entries, e.g. pgid 14222?
- Blank entries were ignored.
Protein2GO Annotations
- Other mappings needed:
- PMID to WBPaper
- Relation text to gorel id
- Curators without orcids to GOC abbreviations
GPAD 2.0 column number
|
GPAD 2.0 column name
|
Action
|
UniProt Source File Example
|
WormBase Output File Example
|
Report on parsing failures
|
1 |
Annotated entity* |
UniProtKB: accession convert to a WBGene id using the latest WB gpi file |
UniProtKB:G5ED58 |
WB:WBGene00006925 |
Yes
|
1 |
Annotated entity* |
UniProtKB: strip digit after '-', then convert to a WBGene id using the latest WB gpi file |
UniProtKB:P34708-1 |
WB: WBGene00006604 |
Yes
|
1 |
Annotated entity* |
ComplexPortal: Ignore |
ComplexPortal:CPX-1000 |
n/a |
n/a
|
1 |
Annotated entity* |
RNAcentral: Ignore |
RNAcentral:URS00000082FF_6239 |
n/a |
n/a
|
2 |
Negation |
Leave as is |
NOT |
NOT |
No
|
3 |
Qualifier |
Leave as is |
RO:0002327 |
RO:0002327 |
No
|
4 |
GO term ID |
Leave as is |
GO:0051306 |
GO:0051306 |
No
|
5 |
Reference* |
Leave GO_REFs as is; map PMID or DOI to corresponding WBPaper id and add WBPaper id as a pipe-separated value |
PMID:10978280 |
PMID:10978280|WB:WBPaper00004310 |
Yes
|
6 |
Evidence |
Leave as is |
ECO:0000314 |
ECO:00000314 |
No
|
7 |
With/From* |
Leave as is, except for UniProtKB: accessions; for UniProtKB: accessions, try to map to a WBGene id using the latest WB gpi file; if UniProtKB: accession doesn't map to a WBGene id, then leave as is |
UniProtKB:D9PTP8 |
WB:WBGene00013354 |
Yes - output a list of UniProtKB accessions that didn't map to a WBGene
|
8 |
Interacting taxon |
Leave as is |
NCBITaxon:273526 |
NCBITaxon:273526 |
No
|
9 |
Annotation date |
Leave as is |
2006-02-03T12:26 |
2006-02-03T12:26 |
No
|
10 |
Assigned_by |
Leave as it |
WB |
WB |
No
|
11 |
Annotation extensions |
If relation is a text string, convert to an id according RO mapping table above. Otherwise, leave as is. |
RO:0002233(WB:WBGene00000584) |
RO:0002233(WB:WBGene00000584) |
Yes - report on any relation text strings that don't map to an ontology id.
|
12 |
Annotation properties* |
Most will stay as is, except for history (see below). |
contributor-id=https://orcid.org/0000-0002-1706-4196%7Ccomment=action:Updated by Kimberly Van Auken|model-state=???
|
Annotated Entity
- Use latest WormBase gpi file to map UniProtKB accessions to WBGene ids (take column 1 value in incoming GPAD file and find corresponding column 9 value in WB gpi file) then map to corresponding WBGene ID (column 2 in WB gpi file)
- Latest WormBase gpi file: ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.canonical_bioproject.current_development.gene_product_info.gpi.gz
- For input values that include a '-digit', e.g. UniProtKB:P37806-1, strip the '-digit' for the purposes of mapping to a WBGene and add a comment in the Annotation properties field: comment=Original annotation made to whatever the full UniProtKB accession is. For example: comment=Original annotation made to UniProtKB:P37806-1.
- We will be ignoring any GPAD lines that have ComplexPortal or RNAcentral ids in Column 1.
- We have made some manual annotations to organisms other than C. elegans, so there will be some GPAD lines for which there is no mapping from the UniProtKB accession to a WBGene id but for which we want to migrate the annotation to Noctua. To find these, we'll need to check if the unmappable UniProtKB accession is also in the 6239 GPAD file. If yes, discard; if no, keep. Report on which ones we kept and which ones we discarded.
References
- Map incoming doi or PMID to WBPaper id using the pap_identifier table.
With/From
Annotation Properties
- Contributor
- Most contributors are captured with an orcid.
- However, Carol and Josh do not have orcids, so we need to populate a GOC abberviation for their contributor id.
- If contributor-id is blank and comment=action:Added by Josh Jaffery [Expired account], then populate contributor-id=GOC:jja
- If contributor-id is blank and comment=action:Added by Carol Bastiani [Expired account], then populate contributor-id=GOC:cab1
- History
- Group annotations by id, e.g. id=GOA:2113472118
- If more than one line with same id, check corresponding date field (column 9) for each line
- For most recent date of grouped annotations, add creation-date=YYYY-MM-DD from earliest annotation, modification-date=YYYY-MM-DD from each subsequent date
Final File for Import
- The final files for import will be a file of OA annotations and a file of Protein2GO GPAD files
- We'll need to make the files available for Dustin to pick up somewhere for the import or for me to gzip and email him.