Difference between revisions of "GO-CAM GPAD"

From WormBaseWiki
Jump to navigationJump to search
Line 31: Line 31:
 
**Subroutine exists in goa gpad parsing script, but we need to add all of the relations  
 
**Subroutine exists in goa gpad parsing script, but we need to add all of the relations  
 
**And check on how NOT is handled
 
**And check on how NOT is handled
+
  sub populateAnnotrelToRo {
 +
  $annotToRo{'colocalizes_with'} = 'RO:0002325';
 +
  $annotToRo{'contributes_to'}  = 'RO:0002326';
 +
  $annotToRo{'enables'}          = 'RO:0002327';
 +
  $annotToRo{'involved_in'}      = 'RO:0002331';
 +
  $annotToRo{'part_of'}          = 'BFO:0000050';
 +
 
 
=== Column 4: GO ID ===
 
=== Column 4: GO ID ===
 
*GO_term (no conversion needed)
 
*GO_term (no conversion needed)

Revision as of 20:40, 23 October 2018

Specifications for Converting GO-CAM GPAD to .ace File

GO_Annotation IDs - finding the right number from which to start

  • Will need to handle similarly to the OA annotations in that we'll need to consult .ace GO annotation files sequentially to get the correct starting number for the GO_annotation objects generated from this GPAD file.
    • See code at lines 45-49 of /home/postgres/work/citace_upload/go_curation/get_go_annotation_ace.pm
 my $annotCounter = 0;
 my $annotFile = '/home/acedb/kimberly/citace_upload/go/gp_annotation.ace';
 open (IN, "<$annotFile") or die "Cannot open $annotFile : $!";
 while (my $line = <IN>) { if ($line =~ m/GO_annotation : "(\d+)"/) { $annotCounter = $1; } }
 close (IN) or die "Cannot close $annotFile : $!";
  • In this case, I would move the go_annotation.ace file to the go/ directory as I do for the gp_annotation.ace file above and we would start from the next highest number after consulting the go_annotation.ace file.

SynGO Annotations

  • Proposal: skip all annotations that are Assigned_by SynGO and get these from UniProt-GOA because the evidence codes are already mapped for us for GAF.
    • The reason this matters is that SynGO uses evidence codes other than the three-letter codes and so their evidence codes need to be mapped to the traditional three-letter codes.
    • BUT….also need to double-check that all of the SynGO annotations in Noctua GPAD are also in Protein2GO and confirm with Tony, Alex, Dustin, Paul, etc. how new SynGO annotations will enter the pipeline.
    • In Protein2GO, SynGO annotations have Source 'SYNX'.
    • Application of evidence codes looks to be different from GO guidelines.

Columns and Mappings

Column 1: DB Prefix

  • Ignore

Column 2: DB Object ID

  • Populate Gene field in ?GO_annotation (no conversion needed)

Column 3: Relation

  • Populate Annotation_relation field with RO identifier
    • For now, need to map the text string to an RO identifier (no NOT annotations here, yet)
    • Subroutine exists in goa gpad parsing script, but we need to add all of the relations
    • And check on how NOT is handled
 sub populateAnnotrelToRo {
 $annotToRo{'colocalizes_with'} = 'RO:0002325';
 $annotToRo{'contributes_to'}   = 'RO:0002326';
 $annotToRo{'enables'}          = 'RO:0002327';
 $annotToRo{'involved_in'}      = 'RO:0002331';
 $annotToRo{'part_of'}          = 'BFO:0000050';

Column 4: GO ID

  • GO_term (no conversion needed)

Column 5: Reference

  • Convert PMID to WBPaper
  • Populate GO_REF as usual (see other gpad parsing script)
  • PAINT_REF - convert to appropriate PMID (but why do we have these and are they in the GO database somewhere?)

Column 6: Evidence Code

  • Convert ECO ID to 3-letter GO code
    • This part will probably need to be written new as Tony maps the ECO to GO code in an annotation properties field, but this GPAD currently doesn't have that.

Column 7: With/From

  • What’s here from go_gpad parser:

UniProtKB UniProtKB-KW UniProtKB-SubCell: UniPathway UniRule WB:WBGene WB:WBVar InterPro EC PANTHER PomBase SGD UniProt (this needs to be fixed in our GAF!! - see ‘Modification to our GAF file output’ email thread with Kevin Howe beginning on 2018-07-17; should be okay in WS267)

Column 8: Interacting taxon

  • Currently don’t have any entries for this, but could reuse code from go_gpad parser

Column 9: Date

  • YYYYMMDD - (no conversion needed)

Column 10: Assigned_by

  • WB
    • (SynGO - Need to upload an ?Analysis object that has the relevant details - file done 2018-10-23)

Column 11: Annotation Extensions

  • Parse as for go_gpad parsing script
    • Note that these are all RO relations
      • Also, SynGO is using UBERON for occurs_in and evidence codes like ‘knockout’ that don’t make sense for the papers they’re curating.

Column 12: Annotation Properties

  • Nothing to convert here.
  • At some point, we may want to think about capturing some of this metadata, but will need thorough review first.
  • Check curator; I’m still not sure we’re handling this correctly when we import existing annotations into GO-CAM models.