Difference between revisions of "Noctua - Upload of WB Manual Annotations"

From WormBaseWiki
Jump to navigationJump to search
(Created page with "*[https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md GPAD/GPI 2.0 specs on github] *Mapping from gop_ postgres tables to GPAD 2.0 column: {| cel...")
 
 
(116 intermediate revisions by the same user not shown)
Line 1: Line 1:
*[https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md GPAD/GPI 2.0 specs on github]
+
= GOC GPAD/GPI 2.0 Specifications =
 +
https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md
  
*Mapping from gop_ postgres tables to GPAD 2.0 column:
+
= OA Annotations =
 +
 
 +
== Mapping from gop_ postgres tables to GPAD 2.0 column ==
 +
 
 +
'''Skip all entries that have the value 'False Positive' in gop_falsepositive'''
  
 
{| cellspacing="2" border="1"
 
{| cellspacing="2" border="1"
 
|-
 
|-
! gop_ table name
+
! gop_ postgres table name
 
! GPAD 2.0 column
 
! GPAD 2.0 column
 
! Action
 
! Action
Line 11: Line 16:
  
 
|-
 
|-
| gop_wbgene || 1 || Preface each value with WB: || WB:WBGene00006925
+
| gop_wbgene || 1 || Preface each value with 'WB:' || WB:WBGene00006925
 +
|-
 +
| gop_qualifier || 2|| This is the NEGATION field in the GPAD2.0 file, but no action needed as we don't have any negation in our GO OA annotations. || n/a
 +
|-
 +
| gop_qualifier || 3 || Map text value to Relations Ontology (RO) term id (see table below); populate with RO id. || RO:0001025
 +
|-
 +
| gop_goid || 4|| Add GO term id as it exists in table. || GO:0051306
 +
|-
 +
| gop_accession || 5 || Remove quotes and add id in table; we only have seven of these. || GO_REF:0000015
 
|-
 
|-
| Status || is_obsolete: || If tag is not present, Status should be set to Valid.  If tag is present, Status should be set to Obsolete.
+
| gop_paper || 5 || Add WBPaper ID and corresponding PMID, pipe separated. || PMID:10978280|WB:WBPaper00004310
 
|-
 
|-
| Definition || def: || Add corresponding value including double quotes.  Omit information in brackets at the end of the definition.
+
| gop_goinference || 6 || Map three-letter GO code to ECO code (see below); add ECO id. || ECO:0000314
 
|-
 
|-
| Name || name: || Add corresponding value in double quotes.
+
| gop_with_wbgene || 7 || Preface each value with 'WB:'; comma-separate multiple values || WB:WBGene00000001
 
|-
 
|-
| Broad, Exact, Narrow, or Related || synonym: || For each synonym, check text after double quotes to populate Broad, Exact, Narrow, or Related.  For ?Text add value in double quotes.  Ignore information in brackets. Note that a single GO term object can have multiple synonyms.
+
| gop_with || 7 || Add id as it exists in table; comma-separate multiple values || FB:FBgn0003719
 
|-
 
|-
| Type || namespace: || Make first letter upper case and add corresponding value.
+
| gop_with_phenotype || 7 || Add id as it exists in table; comma-separate multiple values || WBPhenotype:0000689
 
|-
 
|-
| Instance || is_a: || Take object Name (id:) and look for that value in the is_a: tag for all entries.  Fill in .ace tag with corresponding id: from obo file in double quotes.  Can have multiple values.
+
| gop_with_rnai || 7 || Preface each value with 'WB:'; comma-separate multiple values || WB:WBRNAi00001974
 
|-
 
|-
| Component || relationship: part_of || In each relationship: part_of tag, look for Name.  Fill in .ace tag with corresponding id: in double quotes.  Can have multiple values.
+
| gop_with_wbvariation || 7 || Preface each value with 'WB:'; comma-separate multiple values || WB:WBVar00242156
 
|-
 
|-
| Instance_of || is_a: || For Name, look at is_a tag.  Fill in .ace with corresponding value(s) in double quotes.  Can have multiple values.
+
| - || 8 || No action; I don't think we have any values for an interacting taxon. ||
 
|-
 
|-
| Component_of || relationship: part_of || For Name, look at relationship: part_ofFill in .ace with corresponding value(s) in double quotes. Can have multiple values.
+
| gop_lastupdate || 9 || If YYYY-MM-DD, add as exists in table.  If YYYY-MM-DD HH:MM:SS convert to: YYYY-MM-DDTHH:MM || 2020-05-13 or 2006-02-03T12:26
 +
|-
 +
| no OA table || 10 || Add WB || WB
 +
|-
 +
| gop_xrefto || 11 || Convert relation name to RO id, add value, directly and parenthetically, after RO id. || RO:0002233(WB:WBGene00000584)
 +
|-
 +
| ?? || 12 || Add postgres annotation id, prefixed with 'id=WBOA:' || id=WBOA:3565
 +
|-
 +
| gop_curator || 12 || If available, map curator to ORCID and prefix with 'contributor-id=https://orcid.org/'If no ORICD, add 'GOC:cab1' || contributor-id=https://orcid.org/0000-0002-1478-7671 or GOC:cab1
 +
|-
 +
| gop_comment || 12 || Add free text, prefixed with 'comment=' || comment=2020-03-17; flagged FP prior to Noctua upload; no ISS With/From; more specific PAINT annotation exists.
 +
|-
 +
|}
 +
 
 +
=== Mapping gene product-to-term relation names to RO ids. ===
 +
 
 +
{| cellspacing="2" border="1"
 +
|-
 +
! qualifier name (gop_qualifier)
 +
! RO ID
 +
! number of annotations in WS280 ace file (353 annotations total)
 +
 
 +
|-
 +
| acts_upstream_of_or_within || RO:0002264 || 9
 +
|-
 +
| located_in || RO:0001025 || 10
 +
|-
 +
| involved_in || RO:0002331 || 307
 +
|-
 +
| enables || RO:0002327 || 18
 +
|-
 +
| part_of || BFO:0000050 || 8
 +
|-
 +
|}
 +
 
 +
*Note: no instances of these gp2term relations in the OA:
 +
** colocalizes_with (RO:0002325)
 +
** contributes_to (RO:0002326)
 +
*Note: found one annotation coming from the OA that lacked a gp2term relation; updated that for WS281
 +
 
 +
=== Mapping annotation extension relations to RO ids ===
 +
{| cellspacing="2" border="1"
 +
|-
 +
! relation label
 +
! RO ID
 +
! number of annotations in WS280 ace file (353 annotations total)
 +
|-
 +
| has_input || RO:0002233 || 10
 +
|-
 +
| happens_during || RO:0002092 || 4
 +
|-
 +
|}
 +
 
 +
=== Mapping three-letter GO codes to ECO ids. ===
 +
 
 +
{| cellspacing="2" border="1"
 +
|-
 +
! three-letter GO code
 +
! ECO ID
 +
 
 +
|-
 +
| ISS || ECO:0000250
 +
|-
 +
| IEP || ECO:0000270
 +
|-
 +
| NAS || ECO:0000303
 +
|-
 +
| TAS || ECO:0000304
 +
|-
 +
| IC || ECO:0000305
 +
|-
 +
| ND || ECO:0000307
 +
|-
 +
| IDA || ECO:0000314
 +
|-
 +
| IMP || ECO:0000315
 +
|-
 +
| IGI || ECO:0000316
 +
|-
 +
| IPI || ECO:0000353
 +
|-
 +
|}
 +
 
 +
=== Questions ===
 +
*What about blank OA entries, e.g. pgid 14222?
 +
** Blank entries were ignored.
 +
 
 +
= Protein2GO Annotations =
 +
* Input files:
 +
** GPAD from Protein2GO
 +
** Latest WB gpi file: ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.canonical_bioproject.current_development.gene_product_info.gpi.gz
 +
 
 +
* Other mappings needed:
 +
** PMID to WBPaper
 +
** Relation text to gorel id
 +
** Curators without orcids to GOC abbreviations
 +
 
 +
{| cellspacing="2" border="1"
 +
|-
 +
! GPAD 2.0 column number
 +
! GPAD 2.0 column name
 +
! Action
 +
! UniProt Source File Example
 +
! WormBase Output File Example
 +
! Report on parsing failures
 +
 
 +
|-
 +
| 1 || Annotated entity* || Convert each UniProtKB: accession to a WBGene id using the latest WB gpi file || UniProtKB:G5ED58 || WB:WBGene00006925 || Yes
 +
|-
 +
| 2|| Negation || Leave as is || NOT || NOT || No
 +
|-
 +
| 3 || Qualifier || Leave as is || RO:0002327 || RO:0002327 || No
 +
|-
 +
| 4 || GO term ID || Leave as is || GO:0051306 || GO:0051306 || No
 +
|-
 +
| 5 || Reference* || Leave GO_REFs as is; map PMID or DOI to corresponding WBPaper id and add WBPaper id as a pipe-separated value || PMID:10978280 || PMID:10978280|WB:WBPaper00004310 || Yes
 +
|-
 +
| 6 || Evidence || Leave as is || ECO:0000314 || ECO:00000314 || No
 +
|-
 +
| 7 || With/From* || Leave as is, except for UniProtKB: accessions; for UniProtKB: accessions, try to map to a WBGene id using the latest WB gpi file; if UniProtKB: accession doesn't map to a WBGene id, then leave as is || UniProtKB:D9PTP8 || WB:WBGene00013354 || Yes - output a list of UniProtKB accessions that didn't map to a WBGene
 
|-  
 
|-  
| Ancestor || is_a: and relationship: part_of || For each Name, look at is_a: and relationship: part_of.  Fill in .ace with each corresponding value in double quotes.  Then, for each value filled in, iterate and look at their is_a: and relationship: part_of.  Continue to fill in .ace with corresponding value until the root node is reached, for which no is_a or relationship: part_of exists.  Remove any redundant values from list.
+
| 8 || Interacting taxon || Leave as is || NCBITaxon:273526 || NCBITaxon:273526 || No
 
|-
 
|-
| Descendent || is_a: and relationship: part_of || For each is_a: and relationship: part_of, look for Name.  If Name matches value in either tag, fill in .ace with corresponding id: in double qutoes.  For each corresponding id:, iterate and look for id: in is_a: and relationship: part_of.  Continue to iterate until each id: is no longer found in is_a: and relationship: part_of.  Remove any redundant values from list.
+
| 9 || Annotation date || Leave as is || 2006-02-03T12:26 || 2006-02-03T12:26 || No
 
|-
 
|-
| Version || In header, data-version: releases/YYYY-MM-DD || Gene Ontology followed by value after 'data-version:' tag.
+
| 10 || Assigned_by || Leave as it || WB || WB || No
 +
|-
 +
| 11 || Annotation extensions || If relation is a text string, convert to an id according RO mapping table above.  Otherwise, leave as is.  || RO:0002233(WB:WBGene00000584) || RO:0002233(WB:WBGene00000584) || Yes - report on any relation text strings that don't map to an ontology id.
 +
|-
 +
| 12 || Annotation properties* || Most will stay as is, except for history (see below). || id=GOA:2113483848|contributor-id=https://orcid.org/0000-0002-1706-4196|comment=action:Updated by Kimberly Van Auken|model-state=???
 
|-
 
|-
 
|}
 
|}
 +
 +
 +
=== Annotated Entity ===
 +
* Use latest WormBase gpi file to map UniProtKB accessions (column 1 in incoming GPAD file; column 9 in WB gpi file) to corresponding WBGene ID (column 2 in WB gpi file)
 +
* Latest WormBase gpi file:  ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.canonical_bioproject.current_development.gene_product_info.gpi.gz
 +
* For input values that include a '-digit', e.g. UniProtKB:P37806-1, strip the '-digit' for the purposes of mapping to a WBGene and add a comment in the Annotation properties field: comment=Original annotation made to whatever the full UniProtKB accession is.  For example: comment=Original annotation made to UniProtKB:P37806-1.
 +
* We have made some manual annotations to organisms other than C. elegans, so there will be some GPAD lines for which there is no mapping from the UniProtKB accession to a WBGene id.
 +
 +
=== References ===
 +
* Map incoming doi or PMID to WBPaper id using the pap_identifier table.
 +
 +
=== With/From ===
 +
* Use latest WormBase gpi file to map UniProtKB accessions (column 1 in incoming GPAD file; column 9 in WB gpi file) to corresponding WBGene ID (column 2 in WB gpi file)
 +
* Latest WormBase gpi file:  ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.canonical_bioproject.current_development.gene_product_info.gpi.gz
 +
 +
=== Annotation Properties ===
 +
* Contributor
 +
** Most contributors are captured with an orcid.
 +
** However, Carol and Josh do not have orcids, so we need to populate a GOC abberviation for their contributor id.
 +
*** If contributor-id is blank and comment=action:Added by Josh Jaffery [Expired account], then populate contributor-id=GOC:jja
 +
*** If contributor-id is blank and comment=action:Added by Carol Bastiani [Expired account], then populate contributor-id=GOC:cab1
 +
* History
 +
** Group annotations by id, e.g. id=GOA:2113472118
 +
** If more than one line with same id, check corresponding date field (column 9) for each line
 +
** For most recent date of grouped annotations, add creation-date=YYYY-MM-DD from earliest annotation, modification-date=YYYY-MM-DD from each subsequent date
 +
 +
=== Final File for Import ===
 +
* The final files for import will be a file of OA annotations and a file of Protein2GO GPAD files
 +
* We'll need to make the files available for Dustin to pick up somewhere for the import or for me to gzip and email him.

Latest revision as of 17:54, 10 March 2021

GOC GPAD/GPI 2.0 Specifications

https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md

OA Annotations

Mapping from gop_ postgres tables to GPAD 2.0 column

Skip all entries that have the value 'False Positive' in gop_falsepositive

gop_ postgres table name GPAD 2.0 column Action Example
gop_wbgene 1 Preface each value with 'WB:' WB:WBGene00006925
gop_qualifier 2 This is the NEGATION field in the GPAD2.0 file, but no action needed as we don't have any negation in our GO OA annotations. n/a
gop_qualifier 3 Map text value to Relations Ontology (RO) term id (see table below); populate with RO id. RO:0001025
gop_goid 4 Add GO term id as it exists in table. GO:0051306
gop_accession 5 Remove quotes and add id in table; we only have seven of these. GO_REF:0000015
gop_paper 5 Add WBPaper ID and corresponding PMID, pipe separated. PMID:10978280|WB:WBPaper00004310
gop_goinference 6 Map three-letter GO code to ECO code (see below); add ECO id. ECO:0000314
gop_with_wbgene 7 Preface each value with 'WB:'; comma-separate multiple values WB:WBGene00000001
gop_with 7 Add id as it exists in table; comma-separate multiple values FB:FBgn0003719
gop_with_phenotype 7 Add id as it exists in table; comma-separate multiple values WBPhenotype:0000689
gop_with_rnai 7 Preface each value with 'WB:'; comma-separate multiple values WB:WBRNAi00001974
gop_with_wbvariation 7 Preface each value with 'WB:'; comma-separate multiple values WB:WBVar00242156
- 8 No action; I don't think we have any values for an interacting taxon.
gop_lastupdate 9 If YYYY-MM-DD, add as exists in table. If YYYY-MM-DD HH:MM:SS convert to: YYYY-MM-DDTHH:MM 2020-05-13 or 2006-02-03T12:26
no OA table 10 Add WB WB
gop_xrefto 11 Convert relation name to RO id, add value, directly and parenthetically, after RO id. RO:0002233(WB:WBGene00000584)
?? 12 Add postgres annotation id, prefixed with 'id=WBOA:' id=WBOA:3565
gop_curator 12 If available, map curator to ORCID and prefix with 'contributor-id=https://orcid.org/'. If no ORICD, add 'GOC:cab1' contributor-id=https://orcid.org/0000-0002-1478-7671 or GOC:cab1
gop_comment 12 Add free text, prefixed with 'comment=' comment=2020-03-17; flagged FP prior to Noctua upload; no ISS With/From; more specific PAINT annotation exists.

Mapping gene product-to-term relation names to RO ids.

qualifier name (gop_qualifier) RO ID number of annotations in WS280 ace file (353 annotations total)
acts_upstream_of_or_within RO:0002264 9
located_in RO:0001025 10
involved_in RO:0002331 307
enables RO:0002327 18
part_of BFO:0000050 8
  • Note: no instances of these gp2term relations in the OA:
    • colocalizes_with (RO:0002325)
    • contributes_to (RO:0002326)
  • Note: found one annotation coming from the OA that lacked a gp2term relation; updated that for WS281

Mapping annotation extension relations to RO ids

relation label RO ID number of annotations in WS280 ace file (353 annotations total)
has_input RO:0002233 10
happens_during RO:0002092 4

Mapping three-letter GO codes to ECO ids.

three-letter GO code ECO ID
ISS ECO:0000250
IEP ECO:0000270
NAS ECO:0000303
TAS ECO:0000304
IC ECO:0000305
ND ECO:0000307
IDA ECO:0000314
IMP ECO:0000315
IGI ECO:0000316
IPI ECO:0000353

Questions

  • What about blank OA entries, e.g. pgid 14222?
    • Blank entries were ignored.

Protein2GO Annotations

  • Other mappings needed:
    • PMID to WBPaper
    • Relation text to gorel id
    • Curators without orcids to GOC abbreviations
GPAD 2.0 column number GPAD 2.0 column name Action UniProt Source File Example WormBase Output File Example Report on parsing failures
1 Annotated entity* Convert each UniProtKB: accession to a WBGene id using the latest WB gpi file UniProtKB:G5ED58 WB:WBGene00006925 Yes
2 Negation Leave as is NOT NOT No
3 Qualifier Leave as is RO:0002327 RO:0002327 No
4 GO term ID Leave as is GO:0051306 GO:0051306 No
5 Reference* Leave GO_REFs as is; map PMID or DOI to corresponding WBPaper id and add WBPaper id as a pipe-separated value PMID:10978280 PMID:10978280|WB:WBPaper00004310 Yes
6 Evidence Leave as is ECO:0000314 ECO:00000314 No
7 With/From* Leave as is, except for UniProtKB: accessions; for UniProtKB: accessions, try to map to a WBGene id using the latest WB gpi file; if UniProtKB: accession doesn't map to a WBGene id, then leave as is UniProtKB:D9PTP8 WB:WBGene00013354 Yes - output a list of UniProtKB accessions that didn't map to a WBGene
8 Interacting taxon Leave as is NCBITaxon:273526 NCBITaxon:273526 No
9 Annotation date Leave as is 2006-02-03T12:26 2006-02-03T12:26 No
10 Assigned_by Leave as it WB WB No
11 Annotation extensions If relation is a text string, convert to an id according RO mapping table above. Otherwise, leave as is. RO:0002233(WB:WBGene00000584) RO:0002233(WB:WBGene00000584) Yes - report on any relation text strings that don't map to an ontology id.
12 Annotation properties* Most will stay as is, except for history (see below). contributor-id=https://orcid.org/0000-0002-1706-4196%7Ccomment=action:Updated by Kimberly Van Auken|model-state=???


Annotated Entity

  • Use latest WormBase gpi file to map UniProtKB accessions (column 1 in incoming GPAD file; column 9 in WB gpi file) to corresponding WBGene ID (column 2 in WB gpi file)
  • Latest WormBase gpi file: ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.canonical_bioproject.current_development.gene_product_info.gpi.gz
  • For input values that include a '-digit', e.g. UniProtKB:P37806-1, strip the '-digit' for the purposes of mapping to a WBGene and add a comment in the Annotation properties field: comment=Original annotation made to whatever the full UniProtKB accession is. For example: comment=Original annotation made to UniProtKB:P37806-1.
  • We have made some manual annotations to organisms other than C. elegans, so there will be some GPAD lines for which there is no mapping from the UniProtKB accession to a WBGene id.

References

  • Map incoming doi or PMID to WBPaper id using the pap_identifier table.

With/From

Annotation Properties

  • Contributor
    • Most contributors are captured with an orcid.
    • However, Carol and Josh do not have orcids, so we need to populate a GOC abberviation for their contributor id.
      • If contributor-id is blank and comment=action:Added by Josh Jaffery [Expired account], then populate contributor-id=GOC:jja
      • If contributor-id is blank and comment=action:Added by Carol Bastiani [Expired account], then populate contributor-id=GOC:cab1
  • History
    • Group annotations by id, e.g. id=GOA:2113472118
    • If more than one line with same id, check corresponding date field (column 9) for each line
    • For most recent date of grouped annotations, add creation-date=YYYY-MM-DD from earliest annotation, modification-date=YYYY-MM-DD from each subsequent date

Final File for Import

  • The final files for import will be a file of OA annotations and a file of Protein2GO GPAD files
  • We'll need to make the files available for Dustin to pick up somewhere for the import or for me to gzip and email him.