Difference between revisions of "Noctua - Upload of WB Manual Annotations"
From WormBaseWiki
Jump to navigationJump to search(23 intermediate revisions by the same user not shown) | |||
Line 18: | Line 18: | ||
| gop_wbgene || 1 || Preface each value with 'WB:' || WB:WBGene00006925 | | gop_wbgene || 1 || Preface each value with 'WB:' || WB:WBGene00006925 | ||
|- | |- | ||
− | | gop_qualifier || 2|| | + | | gop_qualifier || 2|| This is the NEGATION field in the GPAD2.0 file, but no action needed as we don't have any negation in our GO OA annotations. || n/a |
|- | |- | ||
− | | gop_qualifier || 3 || Map text value to Relations Ontology (RO) term id (see below); | + | | gop_qualifier || 3 || Map text value to Relations Ontology (RO) term id (see table below); populate with RO id. || RO:0001025 |
|- | |- | ||
| gop_goid || 4|| Add GO term id as it exists in table. || GO:0051306 | | gop_goid || 4|| Add GO term id as it exists in table. || GO:0051306 | ||
|- | |- | ||
− | | gop_accession || 5 || | + | | gop_accession || 5 || Remove quotes and add id in table; we only have seven of these. || GO_REF:0000015 |
|- | |- | ||
| gop_paper || 5 || Add WBPaper ID and corresponding PMID, pipe separated. || PMID:10978280|WB:WBPaper00004310 | | gop_paper || 5 || Add WBPaper ID and corresponding PMID, pipe separated. || PMID:10978280|WB:WBPaper00004310 | ||
Line 56: | Line 56: | ||
|} | |} | ||
− | === Mapping relation names to RO ids. === | + | === Mapping gene product-to-term relation names to RO ids. === |
{| cellspacing="2" border="1" | {| cellspacing="2" border="1" | ||
Line 62: | Line 62: | ||
! qualifier name (gop_qualifier) | ! qualifier name (gop_qualifier) | ||
! RO ID | ! RO ID | ||
+ | ! number of annotations in WS280 ace file (353 annotations total) | ||
|- | |- | ||
− | | | + | | acts_upstream_of_or_within || RO:0002264 || 9 |
|- | |- | ||
− | | | + | | located_in || RO:0001025 || 10 |
|- | |- | ||
− | | | + | | involved_in || RO:0002331 || 307 |
|- | |- | ||
− | | | + | | enables || RO:0002327 || 18 |
|- | |- | ||
− | | | + | | part_of || BFO:0000050 || 8 |
|- | |- | ||
− | | | + | |} |
+ | |||
+ | *Note: no instances of these gp2term relations in the OA: | ||
+ | ** colocalizes_with (RO:0002325) | ||
+ | ** contributes_to (RO:0002326) | ||
+ | *Note: found one annotation coming from the OA that lacked a gp2term relation; updated that for WS281 | ||
+ | |||
+ | === Mapping annotation extension relations to RO ids === | ||
+ | {| cellspacing="2" border="1" | ||
|- | |- | ||
− | + | ! relation label | |
− | + | ! RO ID | |
− | + | ! number of annotations in WS280 ace file (353 annotations total) | |
|- | |- | ||
− | | | + | | has_input || RO:0002233 || 10 |
|- | |- | ||
− | | | + | | happens_during || RO:0002092 || 4 |
|- | |- | ||
− | | | + | | occurs_in || BFO:0000066 || 1 |
|- | |- | ||
− | | | + | | part_of || BFO:0000050 || 8 |
− | | | ||
− | | | ||
− | |||
− | |||
− | |||
− | |||
|- | |- | ||
|} | |} | ||
Line 134: | Line 137: | ||
** GPAD from Protein2GO | ** GPAD from Protein2GO | ||
** Latest WB gpi file: ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.canonical_bioproject.current_development.gene_product_info.gpi.gz | ** Latest WB gpi file: ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.canonical_bioproject.current_development.gene_product_info.gpi.gz | ||
− | + | ||
* Other mappings needed: | * Other mappings needed: | ||
** PMID to WBPaper | ** PMID to WBPaper | ||
Line 202: | Line 205: | ||
=== Final File for Import === | === Final File for Import === | ||
− | * The final | + | * The final files for import will be a file of OA annotations and a file of Protein2GO GPAD files |
− | * We'll need to make the | + | * We'll need to make the files available for Dustin to pick up somewhere for the import or for me to gzip and email him. |
Revision as of 14:53, 13 May 2021
GOC GPAD/GPI 2.0 Specifications
https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md
OA Annotations
Mapping from gop_ postgres tables to GPAD 2.0 column
Skip all entries that have the value 'False Positive' in gop_falsepositive
gop_ postgres table name | GPAD 2.0 column | Action | Example |
---|---|---|---|
gop_wbgene | 1 | Preface each value with 'WB:' | WB:WBGene00006925 |
gop_qualifier | 2 | This is the NEGATION field in the GPAD2.0 file, but no action needed as we don't have any negation in our GO OA annotations. | n/a |
gop_qualifier | 3 | Map text value to Relations Ontology (RO) term id (see table below); populate with RO id. | RO:0001025 |
gop_goid | 4 | Add GO term id as it exists in table. | GO:0051306 |
gop_accession | 5 | Remove quotes and add id in table; we only have seven of these. | GO_REF:0000015 |
gop_paper | 5 | Add WBPaper ID and corresponding PMID, pipe separated. | PMID:10978280|WB:WBPaper00004310 |
gop_goinference | 6 | Map three-letter GO code to ECO code (see below); add ECO id. | ECO:0000314 |
gop_with_wbgene | 7 | Preface each value with 'WB:'; comma-separate multiple values | WB:WBGene00000001 |
gop_with | 7 | Add id as it exists in table; comma-separate multiple values | FB:FBgn0003719 |
gop_with_phenotype | 7 | Add id as it exists in table; comma-separate multiple values | WBPhenotype:0000689 |
gop_with_rnai | 7 | Preface each value with 'WB:'; comma-separate multiple values | WB:WBRNAi00001974 |
gop_with_wbvariation | 7 | Preface each value with 'WB:'; comma-separate multiple values | WB:WBVar00242156 |
- | 8 | No action; I don't think we have any values for an interacting taxon. | |
gop_lastupdate | 9 | If YYYY-MM-DD, add as exists in table. If YYYY-MM-DD HH:MM:SS convert to: YYYY-MM-DDTHH:MM | 2020-05-13 or 2006-02-03T12:26 |
no OA table | 10 | Add WB | WB |
gop_xrefto | 11 | Convert relation name to RO id, add value, directly and parenthetically, after RO id. | RO:0002233(WB:WBGene00000584) |
?? | 12 | Add postgres annotation id, prefixed with 'id=WBOA:' | id=WBOA:3565 |
gop_curator | 12 | If available, map curator to ORCID and prefix with 'contributor-id=https://orcid.org/'. If no ORICD, add 'GOC:cab1' | contributor-id=https://orcid.org/0000-0002-1478-7671 or GOC:cab1 |
gop_comment | 12 | Add free text, prefixed with 'comment=' | comment=2020-03-17; flagged FP prior to Noctua upload; no ISS With/From; more specific PAINT annotation exists. |
Mapping gene product-to-term relation names to RO ids.
qualifier name (gop_qualifier) | RO ID | number of annotations in WS280 ace file (353 annotations total) |
---|---|---|
acts_upstream_of_or_within | RO:0002264 | 9 |
located_in | RO:0001025 | 10 |
involved_in | RO:0002331 | 307 |
enables | RO:0002327 | 18 |
part_of | BFO:0000050 | 8 |
- Note: no instances of these gp2term relations in the OA:
- colocalizes_with (RO:0002325)
- contributes_to (RO:0002326)
- Note: found one annotation coming from the OA that lacked a gp2term relation; updated that for WS281
Mapping annotation extension relations to RO ids
relation label | RO ID | number of annotations in WS280 ace file (353 annotations total) |
---|---|---|
has_input | RO:0002233 | 10 |
happens_during | RO:0002092 | 4 |
occurs_in | BFO:0000066 | 1 |
part_of | BFO:0000050 | 8 |
Mapping three-letter GO codes to ECO ids.
three-letter GO code | ECO ID |
---|---|
ISS | ECO:0000250 |
IEP | ECO:0000270 |
NAS | ECO:0000303 |
TAS | ECO:0000304 |
IC | ECO:0000305 |
ND | ECO:0000307 |
IDA | ECO:0000314 |
IMP | ECO:0000315 |
IGI | ECO:0000316 |
IPI | ECO:0000353 |
Questions
- What about blank OA entries, e.g. pgid 14222?
- Blank entries were ignored.
Protein2GO Annotations
- Input files:
- Other mappings needed:
- PMID to WBPaper
- Relation text to gorel id
- Curators without orcids to GOC abbreviations
GPAD 2.0 column number | GPAD 2.0 column name | Action | UniProt Source File Example | WormBase Output File Example | Report on parsing failures |
---|---|---|---|---|---|
1 | Annotated entity* | Convert each UniProtKB: accession to a WBGene id using the latest WB gpi file | UniProtKB:G5ED58 | WB:WBGene00006925 | Yes |
2 | Negation | Leave as is | NOT | NOT | No |
3 | Qualifier | Leave as is | RO:0002327 | RO:0002327 | No |
4 | GO term ID | Leave as is | GO:0051306 | GO:0051306 | No |
5 | Reference* | Leave GO_REFs as is; map PMID or DOI to corresponding WBPaper id and add WBPaper id as a pipe-separated value | PMID:10978280 | PMID:10978280|WB:WBPaper00004310 | Yes |
6 | Evidence | Leave as is | ECO:0000314 | ECO:00000314 | No |
7 | With/From* | Leave as is, except for UniProtKB: accessions; for UniProtKB: accessions, try to map to a WBGene id using the latest WB gpi file; if UniProtKB: accession doesn't map to a WBGene id, then leave as is | UniProtKB:D9PTP8 | WB:WBGene00013354 | Yes - output a list of UniProtKB accessions that didn't map to a WBGene |
8 | Interacting taxon | Leave as is | NCBITaxon:273526 | NCBITaxon:273526 | No |
9 | Annotation date | Leave as is | 2006-02-03T12:26 | 2006-02-03T12:26 | No |
10 | Assigned_by | Leave as it | WB | WB | No |
11 | Annotation extensions | If relation is a text string, convert to an id according RO mapping table above. Otherwise, leave as is. | RO:0002233(WB:WBGene00000584) | RO:0002233(WB:WBGene00000584) | Yes - report on any relation text strings that don't map to an ontology id. |
12 | Annotation properties* | Most will stay as is, except for history (see below). | contributor-id=https://orcid.org/0000-0002-1706-4196%7Ccomment=action:Updated by Kimberly Van Auken|model-state=??? |
Annotated Entity
- Use latest WormBase gpi file to map UniProtKB accessions (column 1 in incoming GPAD file; column 9 in WB gpi file) to corresponding WBGene ID (column 2 in WB gpi file)
- Latest WormBase gpi file: ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.canonical_bioproject.current_development.gene_product_info.gpi.gz
- For input values that include a '-digit', e.g. UniProtKB:P37806-1, strip the '-digit' for the purposes of mapping to a WBGene and add a comment in the Annotation properties field: comment=Original annotation made to whatever the full UniProtKB accession is. For example: comment=Original annotation made to UniProtKB:P37806-1.
- We have made some manual annotations to organisms other than C. elegans, so there will be some GPAD lines for which there is no mapping from the UniProtKB accession to a WBGene id.
References
- Map incoming doi or PMID to WBPaper id using the pap_identifier table.
With/From
- Use latest WormBase gpi file to map UniProtKB accessions (column 1 in incoming GPAD file; column 9 in WB gpi file) to corresponding WBGene ID (column 2 in WB gpi file)
- Latest WormBase gpi file: ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.canonical_bioproject.current_development.gene_product_info.gpi.gz
Annotation Properties
- Contributor
- Most contributors are captured with an orcid.
- However, Carol and Josh do not have orcids, so we need to populate a GOC abberviation for their contributor id.
- If contributor-id is blank and comment=action:Added by Josh Jaffery [Expired account], then populate contributor-id=GOC:jja
- If contributor-id is blank and comment=action:Added by Carol Bastiani [Expired account], then populate contributor-id=GOC:cab1
- History
- Group annotations by id, e.g. id=GOA:2113472118
- If more than one line with same id, check corresponding date field (column 9) for each line
- For most recent date of grouped annotations, add creation-date=YYYY-MM-DD from earliest annotation, modification-date=YYYY-MM-DD from each subsequent date
Final File for Import
- The final files for import will be a file of OA annotations and a file of Protein2GO GPAD files
- We'll need to make the files available for Dustin to pick up somewhere for the import or for me to gzip and email him.