Difference between revisions of "GO-CAM GPAD"
From WormBaseWiki
Jump to navigationJump to searchLine 1: | Line 1: | ||
= Specifications for Converting GO-CAM GPAD to .ace File = | = Specifications for Converting GO-CAM GPAD to .ace File = | ||
+ | |||
+ | == Source File Location == | ||
+ | *http://build.berkeleybop.org/job/export-lego-to-gpad-sparql/lastSuccessfulBuild/artifact/legacy/wb.gpad | ||
== GO_Annotation IDs - finding the right number from which to start == | == GO_Annotation IDs - finding the right number from which to start == |
Revision as of 20:51, 23 October 2018
Contents
- 1 Specifications for Converting GO-CAM GPAD to .ace File
- 1.1 Source File Location
- 1.2 GO_Annotation IDs - finding the right number from which to start
- 1.3 SynGO Annotations
- 1.4 Columns and Mappings
- 1.4.1 Column 1: DB Prefix
- 1.4.2 Column 2: DB Object ID
- 1.4.3 Column 3: Relation
- 1.4.4 Column 4: GO ID
- 1.4.5 Column 5: Reference
- 1.4.6 Column 6: Evidence Code
- 1.4.7 Column 7: With/From
- 1.4.8 Column 8: Interacting taxon
- 1.4.9 Column 9: Date
- 1.4.10 Column 10: Assigned_by
- 1.4.11 Column 11: Annotation Extensions
- 1.4.12 Column 12: Annotation Properties
Specifications for Converting GO-CAM GPAD to .ace File
Source File Location
GO_Annotation IDs - finding the right number from which to start
- Will need to handle similarly to the OA annotations in that we'll need to consult .ace GO annotation files sequentially to get the correct starting number for the GO_annotation objects generated from this GPAD file.
- See code at lines 45-49 of /home/postgres/work/citace_upload/go_curation/get_go_annotation_ace.pm
my $annotCounter = 0; my $annotFile = '/home/acedb/kimberly/citace_upload/go/gp_annotation.ace'; open (IN, "<$annotFile") or die "Cannot open $annotFile : $!"; while (my $line = <IN>) { if ($line =~ m/GO_annotation : "(\d+)"/) { $annotCounter = $1; } } close (IN) or die "Cannot close $annotFile : $!";
- In this case, I would move the go_annotation.ace file to the go/ directory as I do for the gp_annotation.ace file above and we would start from the next highest number after consulting the go_annotation.ace file.
SynGO Annotations
- Proposal: skip all annotations that are Assigned_by SynGO and get these from UniProt-GOA because the evidence codes are already mapped for us for GAF.
- The reason this matters is that SynGO uses evidence codes other than the three-letter codes and so their evidence codes need to be mapped to the traditional three-letter codes.
- BUT….also need to double-check that all of the SynGO annotations in Noctua GPAD are also in Protein2GO and confirm with Tony, Alex, Dustin, Paul, etc. how new SynGO annotations will enter the pipeline.
- In Protein2GO, SynGO annotations have Source 'SYNX'.
- Application of evidence codes looks to be different from GO guidelines.
Columns and Mappings
Column 1: DB Prefix
- Ignore
Column 2: DB Object ID
- Populate Gene field in ?GO_annotation (no conversion needed)
Column 3: Relation
- Populate Annotation_relation field with RO identifier
- For now, need to map the text string to an RO identifier (no NOT annotations here, yet)
- Subroutine exists in goa gpad parsing script, but we need to add all of the relations
- And check on how NOT is handled
sub populateAnnotrelToRo { $annotToRo{'colocalizes_with'} = 'RO:0002325'; $annotToRo{'contributes_to'} = 'RO:0002326'; $annotToRo{'enables'} = 'RO:0002327'; $annotToRo{'involved_in'} = 'RO:0002331'; $annotToRo{'part_of'} = 'BFO:0000050'; $annotToRo{'acts_upstream_of_or_within'} = 'RO:0002264'; $annotToRo{'acts_upstream_of'} = 'RO_0002263'; $annotToRo{'acts_upstream_of_or_within_negative_effect'} = 'RO:0004033'; $annotToRo{'acts_upstream_of_or_within_positive_effect'} = 'RO:0004032'; $annotToRo{'acts_upstream_of_negative_effect'} = 'RO:0004035'; $annotToRo{'acts_upstream_of_positive_effect'} = 'RO:0004034'; } # sub populateAnnotrelToRo
Column 4: GO ID
- GO_term (no conversion needed)
Column 5: Reference
- Convert PMID to WBPaper
- Populate GO_REF as usual (see other gpad parsing script)
- PAINT_REF - convert to appropriate PMID (but why do we have these and are they in the GO database somewhere?)
Column 6: Evidence Code
- Convert ECO ID to 3-letter GO code
- This part will probably need to be written new as Tony maps the ECO to GO code in an annotation properties field, but this GPAD currently doesn't have that.
Column 7: With/From
- What’s here from go_gpad parser:
UniProtKB UniProtKB-KW UniProtKB-SubCell: UniPathway UniRule WB:WBGene WB:WBVar InterPro EC PANTHER PomBase SGD UniProt (this needs to be fixed in our GAF!! - see ‘Modification to our GAF file output’ email thread with Kevin Howe beginning on 2018-07-17; should be okay in WS267)
Column 8: Interacting taxon
- Currently don’t have any entries for this, but could reuse code from go_gpad parser
Column 9: Date
- YYYYMMDD - (no conversion needed)
Column 10: Assigned_by
- WB
- (SynGO - Need to upload an ?Analysis object that has the relevant details - file done 2018-10-23)
Column 11: Annotation Extensions
- Parse as for go_gpad parsing script
- Note that these are all RO relations
- Also, SynGO is using UBERON for occurs_in and evidence codes like ‘knockout’ that don’t make sense for the papers they’re curating.
- Note that these are all RO relations
Column 12: Annotation Properties
- Nothing to convert here.
- At some point, we may want to think about capturing some of this metadata, but will need thorough review first.
- Check curator; I’m still not sure we’re handling this correctly when we import existing annotations into GO-CAM models.