Difference between revisions of "Noctua - Upload of WB Manual Annotations"

From WormBaseWiki
Jump to navigationJump to search
Line 149: Line 149:
  
 
|-
 
|-
| 1 || Annotated entity* || Convert each UniProtKB: accession to a WB:WBGene id using the latest WB gpi file || UniProtKB:G5ED58 || WB:WBGene00006925 || Yes
+
| 1 || Annotated entity* || Convert each UniProtKB: accession to a WBGene id using the latest WB gpi file || UniProtKB:G5ED58 || WB:WBGene00006925 || Yes
 
|-
 
|-
 
| 2|| Negation || Leave as is || NOT || NOT || No
 
| 2|| Negation || Leave as is || NOT || NOT || No
Line 161: Line 161:
 
| 6 || Evidence || Leave as is || ECO:0000314 || ECO:00000314 || No
 
| 6 || Evidence || Leave as is || ECO:0000314 || ECO:00000314 || No
 
|-
 
|-
| 7 || With/From || Where possible, convert each UniProtKB: accession to a WB:WBGene id using the latest WB gpi file; otherwise leave entry as is || WB:WBGene00000001
+
| 7 || With/From* || Leave as is, except for UniProtKB: accessions; for UniProtKB: accessions, try to map to a WBGene id using the latest WB gpi file; if UniProtKB: accession doesn't map to a WBGene id, then leave as is || UniProtKB:D9PTP8 || WB:WBGene00013354 || Yes - output a list of UniProtKB accessions that didn't map to a WBGene
 
|-  
 
|-  
 
| 8 || Interacting taxon || Leave as is || NCBITaxon:273526
 
| 8 || Interacting taxon || Leave as is || NCBITaxon:273526
Line 186: Line 186:
 
=== References ===
 
=== References ===
 
* Map incoming doi or PMID to WBPaper id using the pap_identifier table.
 
* Map incoming doi or PMID to WBPaper id using the pap_identifier table.
 +
 +
=== With/From ===
 +
* Use latest WormBase gpi file to map UniProtKB accessions (column 1 in incoming GPAD file; column 9 in WB gpi file) to corresponding WBGene ID (column 2 in WB gpi file)
 +
* Latest WormBase gpi file:  ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.canonical_bioproject.current_development.gene_product_info.gpi.gz
  
 
=== *Annotation Properties ===
 
=== *Annotation Properties ===

Revision as of 19:58, 20 July 2020

GOC GPAD/GPI 2.0 Specifications

https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md

OA Annotations

Mapping from gop_ postgres tables to GPAD 2.0 column

Skip all entries that have the value 'False Positive' in gop_falsepositive

gop_ postgres table name GPAD 2.0 column Action Example
gop_wbgene 1 Preface each value with 'WB:' WB:WBGene00006925
gop_qualifier 2 Add text string "NOT" (Note: an OA query for "NOT" didn't return any values, so I don't think we actually have any of these in the OA.) NOT
gop_qualifier 3 Map text value to Relations Ontology (RO) term id (see below); add RO id. RO:0001025
gop_goid 4 Add GO term id as it exists in table. GO:0051306
gop_accession 5 Add id as it exists in table. GO_REF:0000015
gop_paper 5 Add WBPaper ID and corresponding PMID, pipe separated. PMID:10978280|WB:WBPaper00004310
gop_goinference 6 Map three-letter GO code to ECO code (see below); add ECO id. ECO:0000314
gop_with_wbgene 7 Preface each value with 'WB:'; comma-separate multiple values WB:WBGene00000001
gop_with 7 Add id as it exists in table; comma-separate multiple values FB:FBgn0003719
gop_with_phenotype 7 Add id as it exists in table; comma-separate multiple values WBPhenotype:0000689
gop_with_rnai 7 Preface each value with 'WB:'; comma-separate multiple values WB:WBRNAi00001974
gop_with_wbvariation 7 Preface each value with 'WB:'; comma-separate multiple values WB:WBVar00242156
- 8 No action; I don't think we have any values for an interacting taxon.
gop_lastupdate 9 If YYYY-MM-DD, add as exists in table. If YYYY-MM-DD HH:MM:SS convert to: YYYY-MM-DDTHH:MM 2020-05-13 or 2006-02-03T12:26
no OA table 10 Add WB WB
gop_xrefto 11 Convert relation name to RO id, add value, directly and parenthetically, after RO id. RO:0002233(WB:WBGene00000584)
?? 12 Add postgres annotation id, prefixed with 'id=WBOA:' id=WBOA:3565
gop_curator 12 If available, map curator to ORCID and prefix with 'contributor-id=https://orcid.org/'. If no ORICD, add 'GOC:cab1' contributor-id=https://orcid.org/0000-0002-1478-7671 or GOC:cab1
gop_comment 12 Add free text, prefixed with 'comment=' comment=2020-03-17; flagged FP prior to Noctua upload; no ISS With/From; more specific PAINT annotation exists.

Mapping relation names to RO ids.

qualifier name (gop_qualifier) RO ID
part_of BFO:0000050
enables RO:0002327
acts_upstream_of_or_within RO:0002264
colocalizes_with RO:0002325
involved_in RO:0002331
located_in RO:0001025
contributes_to RO:0002326
has_input RO:0002233
happens_during RO:0002092
has_direct_input GOREL:0000752
in_absence_of GOREL:0000755
in_presence_of GOREL:0000027
localization_dependent_on GOREL:0000009
RO:0002211_activity_of GOREL:0098702
dependent_on GOREL:0000004

Mapping three-letter GO codes to ECO ids.

three-letter GO code ECO ID
ISS ECO:0000250
IEP ECO:0000270
NAS ECO:0000303
TAS ECO:0000304
IC ECO:0000305
ND ECO:0000307
IDA ECO:0000314
IMP ECO:0000315
IGI ECO:0000316
IPI ECO:0000353

Questions

  • What about blank OA entries, e.g. pgid 14222?
    • Blank entries were ignored.

Protein2GO Annotations

GPAD 2.0 column number GPAD 2.0 column name Action UniProt Source File Example WormBase Output File Example Report on parsing failures
1 Annotated entity* Convert each UniProtKB: accession to a WBGene id using the latest WB gpi file UniProtKB:G5ED58 WB:WBGene00006925 Yes
2 Negation Leave as is NOT NOT No
3 Qualifier* Leave as is, except for BFO:0000050* RO:0002327 RO:0002327 No
4 GO term ID Leave as is GO:0051306 GO:0051306 No
5 Reference* Leave GO_REFs as is; map PMID or DOI to corresponding WBPaper id and add WBPaper id as a pipe-separated value PMID:10978280 PMID:10978280|WB:WBPaper00004310 Yes
6 Evidence Leave as is ECO:0000314 ECO:00000314 No
7 With/From* Leave as is, except for UniProtKB: accessions; for UniProtKB: accessions, try to map to a WBGene id using the latest WB gpi file; if UniProtKB: accession doesn't map to a WBGene id, then leave as is UniProtKB:D9PTP8 WB:WBGene00013354 Yes - output a list of UniProtKB accessions that didn't map to a WBGene
8 Interacting taxon Leave as is NCBITaxon:273526
9 Annotation date Leave as is 2006-02-03T12:26
10 Assigned_by Leave as it WB
11 Annotation extensions If relation is a text string, convert to an id according RO mapping table above. Otherwise, leave as is. Report on any relation text strings that didn't map to an id. RO:0002233(WB:WBGene00000584)
12 Annotation properties* Most will stay as is, except for history (see below). contributor-id=https://orcid.org/0000-0002-1706-4196%7Ccomment=action:Updated by Kimberly Van Auken|model-state=deleted


Annotated Entity

Qualifiers

  • For annotation lines that use BFO:0000050, check GO ID parentage. If GO:0032991 is a parent term, leave BFO:0000050. If GO:00032991 is not a parent term, change to RO:0001025.

References

  • Map incoming doi or PMID to WBPaper id using the pap_identifier table.

With/From

*Annotation Properties

  • Contributor
    • Most contributors are captured with an orcid.
    • However, Carol and Josh do not have orcids, so we need to populate a GOC abberviation for their contributor id.
      • If contributor-id is blank, and comment=action:Added by Josh Jaffery [Expired account], then populate contributor-id=GOC:jja
      • If contributor-id is blank, and comment=action:Added by Carol Bastiani [Expired account], then populate contributor-id=GOC:cab1
  • History
    • Group annotations by id, e.g. id=GOA:2113472118
    • If more than one line with same id, check corresponding date field (column 9) for each line
    • For most recent date of grouped annotations, leave line as is
    • For earlier dates of grouped annotations, add 'model-state=deleted' to annotation properties field (column 12) by pipe-separating it from the last annotation properties entry

Final File for Import

  • The final file for import will be a concatenated file of OA and Protein2GO GPAD files
  • We'll need to make the file available for Dustin to pick up somewhere for the import.