Difference between revisions of "GO-CAM GPAD"
From WormBaseWiki
Jump to navigationJump to searchLine 78: | Line 78: | ||
=== Column 7: With/From === | === Column 7: With/From === | ||
*What’s here from go_gpad parser: | *What’s here from go_gpad parser: | ||
− | + | UniProtKB | |
− | + | UniProtKB-KW | |
UniProtKB-SubCell | UniProtKB-SubCell |
Revision as of 19:22, 1 November 2018
Contents
- 1 Specifications for Converting GO-CAM GPAD to .ace File
- 1.1 Source File Location
- 1.2 GO_Annotation IDs - finding the right number from which to start
- 1.3 SynGO Annotations
- 1.4 Columns and Mappings
- 1.4.1 Column 1: DB Prefix
- 1.4.2 Column 2: DB Object ID
- 1.4.3 Column 3: Relation
- 1.4.4 Column 4: GO ID
- 1.4.5 Column 5: Reference
- 1.4.6 Column 6: Evidence Code
- 1.4.7 Column 7: With/From
- 1.4.8 Column 8: Interacting taxon
- 1.4.9 Column 9: Date
- 1.4.10 Column 10: Assigned_by
- 1.4.11 Column 11: Annotation Extensions
- 1.4.12 Column 12: Annotation Properties
Specifications for Converting GO-CAM GPAD to .ace File
Source File Location
GO_Annotation IDs - finding the right number from which to start
- Will need to handle similarly to the OA annotations in that we'll need to consult .ace GO annotation files sequentially to get the correct starting number for the GO_annotation objects generated from this GPAD file.
- See code at lines 45-49 of /home/postgres/work/citace_upload/go_curation/get_go_annotation_ace.pm
my $annotCounter = 0; my $annotFile = '/home/acedb/kimberly/citace_upload/go/gp_annotation.ace'; open (IN, "<$annotFile") or die "Cannot open $annotFile : $!"; while (my $line = <IN>) { if ($line =~ m/GO_annotation : "(\d+)"/) { $annotCounter = $1; } } close (IN) or die "Cannot close $annotFile : $!";
- In this case, I would move the go_annotation.ace file to the go/ directory as I do for the gp_annotation.ace file above and we would start from the next highest number after consulting the go_annotation.ace file.
SynGO Annotations
- Proposal: skip all annotations that are Assigned_by SynGO and get these from UniProt-GOA because the evidence codes are already mapped for us for GAF.
- The reason this matters is that SynGO uses evidence codes other than the three-letter codes and so their evidence codes need to be mapped to the traditional three-letter codes.
- BUT….also need to double-check that all of the SynGO annotations in Noctua GPAD are also in Protein2GO and confirm with Tony, Alex, Dustin, Paul, etc. how new SynGO annotations will enter the pipeline.
- In Protein2GO, SynGO annotations have Source 'SYNX'.
- Application of evidence codes looks to be different from GO guidelines.
Columns and Mappings
Column 1: DB Prefix
- Ignore
Column 2: DB Object ID
- Populate Gene field in ?GO_annotation (no conversion needed)
Column 3: Relation
- Populate Annotation_relation field with RO identifier
- For now, need to map the text string to an RO identifier (no NOT annotations here, yet)
- Subroutine exists in goa gpad parsing script, but we need to add all of the relations
- And check on how NOT is handled
sub populateAnnotrelToRo { $annotToRo{'colocalizes_with'} = 'RO:0002325'; $annotToRo{'contributes_to'} = 'RO:0002326'; $annotToRo{'enables'} = 'RO:0002327'; $annotToRo{'involved_in'} = 'RO:0002331'; $annotToRo{'part_of'} = 'BFO:0000050'; $annotToRo{'acts_upstream_of_or_within'} = 'RO:0002264'; $annotToRo{'acts_upstream_of'} = 'RO_0002263'; $annotToRo{'acts_upstream_of_or_within_negative_effect'} = 'RO:0004033'; $annotToRo{'acts_upstream_of_or_within_positive_effect'} = 'RO:0004032'; $annotToRo{'acts_upstream_of_negative_effect'} = 'RO:0004035'; $annotToRo{'acts_upstream_of_positive_effect'} = 'RO:0004034'; } # sub populateAnnotrelToRo
Column 4: GO ID
- GO_term (no conversion needed)
Column 5: Reference
- Convert PMID to WBPaper
- Convert DOI to WBPaper
- Populate GO_REF as usual
- PAINT_REF - convert to appropriate WBPaper (but why do we have annotations with PAINT_REF and are they in the GO database somewhere? They need to be updated in the GO database.)
See lines 217 - 228 in /home/acedb/kimberly/citace_upload/go/gpad2ace/2018_October/go_gpad_parser.pl
Column 6: Evidence Code
- Convert ECO ID to 3-letter GO code
- This part will probably need to be written new as Tony maps the ECO to GO code in an annotation properties field, but this GPAD currently doesn't have that.
ECO:0000250 ISS ECO:0000270 IEP ECO:0000304 TAS ECO:0000307 ND ECO:0000314 IDA ECO:0000315 IMP ECO:0000316 IGI ECO:0000318 IBA ECO:0000352 ??? ECO:0000353 IPI ECO:0000501 IEA ECO:0005611 IDA (but check these as ECO definition is not quite clear/familiar) ECO:0007007 HEP ECO:0007293 IEA (also check this one as it is: experimental evidence used in automatic assertion)
Column 7: With/From
- What’s here from go_gpad parser:
UniProtKB
UniProtKB-KW
UniProtKB-SubCell
UniPathway
UniRule
WB:WBGene
WB:WBVar
InterPro
EC
PANTHER
PomBase
SGD
UniProt (this needs to be fixed in our GAF!! - see ‘Modification to our GAF file output’ email thread with Kevin Howe beginning on 2018-07-17; should be okay in WS267)
Column 8: Interacting taxon
- Currently don’t have any entries for this, but could reuse code from go_gpad parser
Column 9: Date
- YYYYMMDD - (no conversion needed)
Column 10: Assigned_by
- WB
- (SynGO - Need to upload an ?Analysis object that has the relevant details - file done 2018-10-23)
Column 11: Annotation Extensions
- Parse as for go_gpad parsing script
- Note that these are all RO relations
- Also, SynGO is using UBERON for occurs_in and evidence codes like ‘knockout’ that don’t make sense for the papers they’re curating.
- Note that these are all RO relations
Column 12: Annotation Properties
- Nothing to convert here.
- At some point, we may want to think about capturing some of this metadata, but will need thorough review first.
- Check curator; I’m still not sure we’re handling this correctly when we import existing annotations into GO-CAM models.