Noctua - Upload of WB Manual Annotations

From WormBaseWiki
Jump to navigationJump to search

GOC GPAD/GPI 2.0 Specifications

https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md

OA Annotations - this work is done

Mapping from gop_ postgres tables to GPAD 2.0 column

Skip all entries that have the value 'False Positive' in gop_falsepositive

gop_ postgres table name GPAD 2.0 column Action Example
gop_wbgene 1 Preface each value with 'WB:' WB:WBGene00006925
gop_qualifier 2 This is the NEGATION field in the GPAD2.0 file, but no action needed as we don't have any negation in our GO OA annotations. n/a
gop_qualifier 3 Map text value to Relations Ontology (RO) term id (see table below); populate with RO id. RO:0001025
gop_goid 4 Add GO term id as it exists in table. GO:0051306
gop_accession 5 Remove quotes and add id in table; we only have seven of these. GO_REF:0000015
gop_paper 5 Add WBPaper ID and corresponding PMID, pipe separated. PMID:10978280|WB:WBPaper00004310
gop_goinference 6 Map three-letter GO code to ECO code (see below); add ECO id. ECO:0000314
gop_with_wbgene 7 Preface each value with 'WB:'; comma-separate multiple values WB:WBGene00000001
gop_with 7 Add id as it exists in table; comma-separate multiple values FB:FBgn0003719
gop_with_phenotype 7 Add id as it exists in table; comma-separate multiple values WBPhenotype:0000689
gop_with_rnai 7 Preface each value with 'WB:'; comma-separate multiple values WB:WBRNAi00001974
gop_with_wbvariation 7 Preface each value with 'WB:'; comma-separate multiple values WB:WBVar00242156
- 8 No action; I don't think we have any values for an interacting taxon.
gop_lastupdate 9 If YYYY-MM-DD, add as exists in table. If YYYY-MM-DD HH:MM:SS convert to: YYYY-MM-DDTHH:MM 2020-05-13 or 2006-02-03T12:26
no OA table 10 Add WB WB
gop_xrefto 11 Convert relation name to RO id, add value, directly and parenthetically, after RO id. RO:0002233(WB:WBGene00000584)
?? 12 Add postgres annotation id, prefixed with 'id=WBOA:' id=WBOA:3565
gop_curator 12 If available, map curator to ORCID and prefix with 'contributor-id=https://orcid.org/'. If no ORICD, add 'contributor-id=GOC:cab1' contributor-id=https://orcid.org/0000-0002-1478-7671 or contributor-id=GOC:cab1
gop_comment 12 Add free text, prefixed with 'comment=' comment=2020-03-17; flagged FP prior to Noctua upload; no ISS With/From; more specific PAINT annotation exists.
gop_lastupdate 12 Add creation-date=YYYY-MM-DD (or YYYY-MM-DDTHH:MM) creation-date=2021-06-29 (or creation-date=2021-07-15T16:52)
gop_lastupdate 12 Add modification-date=YYYY-MM-DD (or YYYY-MM-DDTHH:MM) modification-date=2021-06-29 (or modification-date=2021-07-15T16:52)

Mapping gene product-to-term relation names to RO ids.

qualifier name (gop_qualifier) RO ID number of annotations in WS280 ace file (353 annotations total)
acts_upstream_of_or_within RO:0002264 9
located_in RO:0001025 10
involved_in RO:0002331 307
enables RO:0002327 18
part_of BFO:0000050 8
  • Note: no instances of these gp2term relations in the OA:
    • colocalizes_with (RO:0002325)
    • contributes_to (RO:0002326)
  • Note: found one annotation coming from the OA that lacked a gp2term relation; updated that for WS281

Mapping annotation extension relations to RO ids

relation label RO ID number of annotations in WS280 ace file (353 annotations total)
has_input RO:0002233 10
happens_during RO:0002092 4
occurs_in BFO:0000066 1
part_of BFO:0000050 8

Mapping three-letter GO codes to ECO ids.

three-letter GO code ECO ID
ISS ECO:0000250
IEP ECO:0000270
NAS ECO:0000303
TAS ECO:0000304
IC ECO:0000305
ND ECO:0000307
IDA ECO:0000314
IMP ECO:0000315
IGI ECO:0000316
IPI ECO:0000353

Questions

  • What about blank OA entries, e.g. pgid 14222?
    • Blank entries were ignored.

Protein2GO Annotations

  • Other mappings needed:
    • PMID to WBPaper
    • Relation text to gorel id
    • Curators without orcids to GOC abbreviations
GPAD 2.0 column number GPAD 2.0 column name Action UniProt Source File Example WormBase Output File Example Report on parsing failures
1 Annotated entity* UniProtKB: accession convert to a WBGene id using the latest WB gpi file UniProtKB:G5ED58 WB:WBGene00006925 Yes
1 Annotated entity* UniProtKB: strip digit after '-', then convert to a WBGene id using the latest WB gpi file UniProtKB:P34708-1 WB: WBGene00006604 Yes
1 Annotated entity* ComplexPortal: Ignore ComplexPortal:CPX-1000 n/a n/a
1 Annotated entity* RNAcentral: Ignore RNAcentral:URS00000082FF_6239 n/a n/a
2 Negation Leave as is NOT NOT No
3 Qualifier Leave as is RO:0002327 RO:0002327 No
4 GO term ID Leave as is GO:0051306 GO:0051306 No
5 Reference* Leave GO_REFs as is; map PMID or DOI to corresponding WBPaper id and add WBPaper id as a pipe-separated value PMID:10978280 PMID:10978280|WB:WBPaper00004310 Yes
6 Evidence Leave as is ECO:0000314 ECO:00000314 No
7 With/From* Leave as is, except for UniProtKB: accessions; for UniProtKB: accessions, try to map to a WBGene id using the latest WB gpi file; if UniProtKB: accession doesn't map to a WBGene id, then leave as is UniProtKB:D9PTP8 WB:WBGene00013354 Yes - output a list of UniProtKB accessions that didn't map to a WBGene
8 Interacting taxon Leave as is NCBITaxon:273526 NCBITaxon:273526 No
9 Annotation date Leave as is 2006-02-03T12:26 2006-02-03T12:26 No
10 Assigned_by Leave as it WB WB No
11 Annotation extensions If relation is a text string, convert to an id according RO mapping table above. Otherwise, leave as is. RO:0002233(WB:WBGene00000584) RO:0002233(WB:WBGene00000584) Yes - report on any relation text strings that don't map to an ontology id.
12 Annotation properties* Most will stay as is, except for history (see below). contributor-id=https://orcid.org/0000-0002-1706-4196%7Ccomment=action:Updated by Kimberly Van Auken|model-state=???


Annotated Entity

  • Use latest WormBase gpi file to map UniProtKB accessions to WBGene ids (take column 1 value in incoming GPAD file and find corresponding column 9 value in WB gpi file) then map to corresponding WBGene ID (column 2 in WB gpi file)
  • Latest WormBase gpi file: ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.canonical_bioproject.current_development.gene_product_info.gpi.gz
  • For input values that include a '-digit', e.g. UniProtKB:P37806-1, strip the '-digit' for the purposes of mapping to a WBGene and add a comment in the Annotation properties field: comment=Original annotation made to whatever the full UniProtKB accession is. For example: comment=Original annotation made to UniProtKB:P37806-1.
  • We will be ignoring any GPAD lines that have ComplexPortal or RNAcentral ids in Column 1.
  • We have made some manual annotations to organisms other than C. elegans, so there will be some GPAD lines for which there is no mapping from the UniProtKB accession to a WBGene id but for which we want to migrate the annotation to Noctua. To find these, we'll need to check if the unmappable UniProtKB accession is also in the 6239 GPAD file. If yes, discard; if no, keep. Report on which ones we kept and which ones we discarded.

References

  • Map incoming doi or PMID to WBPaper id using the pap_identifier table.

With/From

Annotation Properties

  • Contributor
    • Most contributors are captured with an orcid.
    • However, Carol and Josh do not have orcids, so we need to populate a GOC abberviation for their contributor id.
      • If contributor-id is blank and comment=action:Added by Josh Jaffery [Expired account], then populate contributor-id=GOC:jja
      • If contributor-id is blank and comment=action:Added by Carol Bastiani [Expired account], then populate contributor-id=GOC:cab1
  • History
    • Group annotations by id, e.g. id=GOA:2113472118
    • If more than one line with same id, check corresponding date field (column 9) for each line
    • For most recent date of grouped annotations, add creation-date=YYYY-MM-DD from earliest annotation, modification-date=YYYY-MM-DD from each subsequent date

Final File for Import

  • The final files for import will be a file of OA annotations and a file of Protein2GO GPAD files
  • We'll need to make the files available for Dustin to pick up somewhere for the import or for me to gzip and email him.