Difference between revisions of "GO-CAM GPAD"
From WormBaseWiki
Jump to navigationJump to searchLine 78: | Line 78: | ||
=== Column 7: With/From === | === Column 7: With/From === | ||
*What’s here from go_gpad parser: | *What’s here from go_gpad parser: | ||
− | UniProtKB | + | foreach my $with (@withs) { |
− | + | if ($with =~ m/With:Not_supplied/) { 1; } # do nothing | |
− | + | elsif ($with =~ m/^UniProtKB:(\w+)/) { print ACE qq(Database\t"UniProt"\t"UniProtAcc"\t"$1"\n); } | |
− | + | elsif ($with =~ m/^InterPro:(IPR\d+)/) { print ACE qq(Motif\t"INTERPRO:$1"\n); } | |
− | + | elsif ($with =~ m/^HGNC:(\d+)/) { print ACE qq(Database\t"HGNC"\t"HGNCID"\t"$1"\n); } | |
− | + | elsif ($with =~ m/^MGI:(MGI:\d+)/) { print ACE qq(Database\t"MGI"\t"MGIID"\t"$1"\n); } | |
− | + | elsif ($with =~ m/^HAMAP:(MF_\d+)/) { print ACE qq(Database\t"HAMAP"\t"HAMAP_annotation_rule"\t"$1"\n); } | |
− | + | elsif ($with =~ m/^EC:([\.\d]+)/) { print ACE qq(Database\t"KEGG"\t"KEGG_id"\t"$1"\n); } | |
− | + | elsif ($with =~ m/^UniPathway:(UPA\d+)/) { print ACE qq(Database\t"UniPathway"\t"Pathway_id"\t"$1"\n); } | |
− | + | elsif ($with =~ m/^UniProtKB-KW:(KW-\d+)/) { print ACE qq(Database\t"UniProt"\t"UniProtKB-KW"\t"$1"\n); } | |
− | + | elsif ($with =~ m/^UniProtKB-SubCell:(SL-\d+)/) { print ACE qq(Database\t"UniProt"\t"UniProtKB-SubCell"\t"$1"\n); } | |
− | + | elsif ($with =~ m/^UniRule:(\w+)/) { print ACE qq(Database\t"UniProt"\t"UniRule"\t"$1"\n); } | |
− | + | elsif ($with =~ m/^(GO:\d+)/) { print ACE qq(Inferred_from_GO_term\t"$1"\n); } | |
+ | elsif ($with =~ m/^WB:(WBVar\d+)/) { print ACE qq(Variation\t"$1"\n); } | ||
+ | elsif ($with =~ m/^WB:(WBRNAi\d+)/) { print ACE qq(RNAi_result\t"$1"\n); } | ||
+ | elsif ($with =~ m/^WB:(WBGene\d+)/) { print ACE qq(Interacting_gene\t"$1"\n); } | ||
+ | elsif ($with =~ m/^(WBGene\d+)/) { print ACE qq(Interacting_gene\t"$1"\n); } | ||
+ | elsif ($with =~ m/^PomBase:([\.\w]+)/) { print ACE qq(Database\t"PomBase"\t"PomBase_systematic_name"\t"$1"\n); } | ||
+ | elsif ($with =~ m/^SGD:(S\d+)/) { print ACE qq(Database\t"SGD"\t"SGDID"\t"$1"\n); } | ||
+ | elsif ($with =~ m/^PANTHER:(PTN\d+)/) { print ACE qq(Database\t"Panther"\t"PanTree_node"\t"$1"\n); } | ||
+ | elsif ($with =~ m/^TAIR:locus:(\d+)/) { print ACE qq(Database\t"TAIR"\t"TAIR_locus_id"\t"$1"\n); } | ||
+ | elsif ($with =~ m/^FB:(FBgn\d+)/) { print ACE qq(Database\t"FLYBASE"\t"FLYBASEID"\t"$1"\n); } | ||
+ | elsif ($with =~ m/^RGD:(\d+)/) { print ACE qq(Database\t"RGD"\t"RGDID"\t"$1"\n); } | ||
+ | elsif ($with =~ m/^dictyBase:(DDB_G\d+)/) { print ACE qq(Database\t"dictyBase"\t"dictyBaseID"\t"$1"\n); } | ||
+ | else { print ERR qq(WITH $with not acounted in .ace file\n); } | ||
+ | } # foreach my $with (@withs) | ||
=== Column 8: Interacting taxon === | === Column 8: Interacting taxon === |
Revision as of 20:19, 2 November 2018
Contents
- 1 Specifications for Converting GO-CAM GPAD to .ace File
- 1.1 Source File Location
- 1.2 GO_Annotation IDs - finding the right number from which to start
- 1.3 SynGO Annotations
- 1.4 Columns and Mappings
- 1.4.1 Column 1: DB Prefix
- 1.4.2 Column 2: DB Object ID
- 1.4.3 Column 3: Relation
- 1.4.4 Column 4: GO ID
- 1.4.5 Column 5: Reference
- 1.4.6 Column 6: Evidence Code
- 1.4.7 Column 7: With/From
- 1.4.8 Column 8: Interacting taxon
- 1.4.9 Column 9: Date
- 1.4.10 Column 10: Assigned_by
- 1.4.11 Column 11: Annotation Extensions
- 1.4.12 Column 12: Annotation Properties
Specifications for Converting GO-CAM GPAD to .ace File
Source File Location
GO_Annotation IDs - finding the right number from which to start
- Will need to handle similarly to the OA annotations in that we'll need to consult .ace GO annotation files sequentially to get the correct starting number for the GO_annotation objects generated from this GPAD file.
- See code at lines 45-49 of /home/postgres/work/citace_upload/go_curation/get_go_annotation_ace.pm
my $annotCounter = 0; my $annotFile = '/home/acedb/kimberly/citace_upload/go/gp_annotation.ace'; open (IN, "<$annotFile") or die "Cannot open $annotFile : $!"; while (my $line = <IN>) { if ($line =~ m/GO_annotation : "(\d+)"/) { $annotCounter = $1; } } close (IN) or die "Cannot close $annotFile : $!";
- In this case, I would move the go_annotation.ace file to the go/ directory as I do for the gp_annotation.ace file above and we would start from the next highest number after consulting the go_annotation.ace file.
SynGO Annotations
- Proposal: skip all annotations that are Assigned_by SynGO and get these from UniProt-GOA because the evidence codes are already mapped for us for GAF.
- The reason this matters is that SynGO uses evidence codes other than the three-letter codes and so their evidence codes need to be mapped to the traditional three-letter codes.
- BUT….also need to double-check that all of the SynGO annotations in Noctua GPAD are also in Protein2GO and confirm with Tony, Alex, Dustin, Paul, etc. how new SynGO annotations will enter the pipeline.
- In Protein2GO, SynGO annotations have Source 'SYNX'.
- Application of evidence codes looks to be different from GO guidelines.
Columns and Mappings
Column 1: DB Prefix
- Ignore
Column 2: DB Object ID
- Populate Gene field in ?GO_annotation (no conversion needed)
Column 3: Relation
- Populate Annotation_relation field with RO identifier
- For now, need to map the text string to an RO identifier (no NOT annotations here, yet)
- Subroutine exists in goa gpad parsing script, but we need to add all of the relations
- And check on how NOT is handled
sub populateAnnotrelToRo { $annotToRo{'colocalizes_with'} = 'RO:0002325'; $annotToRo{'contributes_to'} = 'RO:0002326'; $annotToRo{'enables'} = 'RO:0002327'; $annotToRo{'involved_in'} = 'RO:0002331'; $annotToRo{'part_of'} = 'BFO:0000050'; $annotToRo{'acts_upstream_of_or_within'} = 'RO:0002264'; $annotToRo{'acts_upstream_of'} = 'RO_0002263'; $annotToRo{'acts_upstream_of_or_within_negative_effect'} = 'RO:0004033'; $annotToRo{'acts_upstream_of_or_within_positive_effect'} = 'RO:0004032'; $annotToRo{'acts_upstream_of_negative_effect'} = 'RO:0004035'; $annotToRo{'acts_upstream_of_positive_effect'} = 'RO:0004034'; } # sub populateAnnotrelToRo
Column 4: GO ID
- GO_term (no conversion needed)
Column 5: Reference
- Convert PMID to WBPaper
- Convert DOI to WBPaper
- Populate GO_REF as usual
- PAINT_REF - convert to appropriate WBPaper (but why do we have annotations with PAINT_REF and are they in the GO database somewhere? They need to be updated in the GO database.)
See lines 217 - 228 in /home/acedb/kimberly/citace_upload/go/gpad2ace/2018_October/go_gpad_parser.pl
Column 6: Evidence Code
- Convert ECO ID to 3-letter GO code
- This part will probably need to be written new as Tony maps the ECO to GO code in an annotation properties field, but this GPAD currently doesn't have that.
ECO:0000250 ISS ECO:0000270 IEP ECO:0000304 TAS ECO:0000307 ND ECO:0000314 IDA ECO:0000315 IMP ECO:0000316 IGI ECO:0000318 IBA ECO:0000352 ??? ECO:0000353 IPI ECO:0000501 IEA ECO:0005611 IDA (but check these as ECO definition is not quite clear/familiar) ECO:0007007 HEP ECO:0007293 IEA (also check this one as it is: experimental evidence used in automatic assertion)
Column 7: With/From
- What’s here from go_gpad parser:
foreach my $with (@withs) { if ($with =~ m/With:Not_supplied/) { 1; } # do nothing elsif ($with =~ m/^UniProtKB:(\w+)/) { print ACE qq(Database\t"UniProt"\t"UniProtAcc"\t"$1"\n); } elsif ($with =~ m/^InterPro:(IPR\d+)/) { print ACE qq(Motif\t"INTERPRO:$1"\n); } elsif ($with =~ m/^HGNC:(\d+)/) { print ACE qq(Database\t"HGNC"\t"HGNCID"\t"$1"\n); } elsif ($with =~ m/^MGI:(MGI:\d+)/) { print ACE qq(Database\t"MGI"\t"MGIID"\t"$1"\n); } elsif ($with =~ m/^HAMAP:(MF_\d+)/) { print ACE qq(Database\t"HAMAP"\t"HAMAP_annotation_rule"\t"$1"\n); } elsif ($with =~ m/^EC:([\.\d]+)/) { print ACE qq(Database\t"KEGG"\t"KEGG_id"\t"$1"\n); } elsif ($with =~ m/^UniPathway:(UPA\d+)/) { print ACE qq(Database\t"UniPathway"\t"Pathway_id"\t"$1"\n); } elsif ($with =~ m/^UniProtKB-KW:(KW-\d+)/) { print ACE qq(Database\t"UniProt"\t"UniProtKB-KW"\t"$1"\n); } elsif ($with =~ m/^UniProtKB-SubCell:(SL-\d+)/) { print ACE qq(Database\t"UniProt"\t"UniProtKB-SubCell"\t"$1"\n); } elsif ($with =~ m/^UniRule:(\w+)/) { print ACE qq(Database\t"UniProt"\t"UniRule"\t"$1"\n); } elsif ($with =~ m/^(GO:\d+)/) { print ACE qq(Inferred_from_GO_term\t"$1"\n); } elsif ($with =~ m/^WB:(WBVar\d+)/) { print ACE qq(Variation\t"$1"\n); } elsif ($with =~ m/^WB:(WBRNAi\d+)/) { print ACE qq(RNAi_result\t"$1"\n); } elsif ($with =~ m/^WB:(WBGene\d+)/) { print ACE qq(Interacting_gene\t"$1"\n); } elsif ($with =~ m/^(WBGene\d+)/) { print ACE qq(Interacting_gene\t"$1"\n); } elsif ($with =~ m/^PomBase:([\.\w]+)/) { print ACE qq(Database\t"PomBase"\t"PomBase_systematic_name"\t"$1"\n); } elsif ($with =~ m/^SGD:(S\d+)/) { print ACE qq(Database\t"SGD"\t"SGDID"\t"$1"\n); } elsif ($with =~ m/^PANTHER:(PTN\d+)/) { print ACE qq(Database\t"Panther"\t"PanTree_node"\t"$1"\n); } elsif ($with =~ m/^TAIR:locus:(\d+)/) { print ACE qq(Database\t"TAIR"\t"TAIR_locus_id"\t"$1"\n); } elsif ($with =~ m/^FB:(FBgn\d+)/) { print ACE qq(Database\t"FLYBASE"\t"FLYBASEID"\t"$1"\n); } elsif ($with =~ m/^RGD:(\d+)/) { print ACE qq(Database\t"RGD"\t"RGDID"\t"$1"\n); } elsif ($with =~ m/^dictyBase:(DDB_G\d+)/) { print ACE qq(Database\t"dictyBase"\t"dictyBaseID"\t"$1"\n); } else { print ERR qq(WITH $with not acounted in .ace file\n); } } # foreach my $with (@withs)
Column 8: Interacting taxon
- Currently don’t have any entries for this, but could reuse code from go_gpad parser
Column 9: Date
- YYYYMMDD - (no conversion needed)
Column 10: Assigned_by
- WB
- SynGO (Need to upload an ?Analysis object that has the relevant details - file done 2018-10-23)
Column 11: Annotation Extensions
- Parse as for go_gpad parsing script
- Note that these are all RO relations
- Also, SynGO is using UBERON for occurs_in and evidence codes like ‘knockout’ that don’t make sense for the papers they’re curating.
- Note that these are all RO relations
Column 12: Annotation Properties
- Nothing to convert here.
- At some point, we may want to think about capturing some of this metadata, but will need thorough review first.
- Check curator; I’m still not sure we’re handling this correctly when we import existing annotations into GO-CAM models.