From WormBaseWiki
Jump to navigationJump to search

Specifications for Converting GO-CAM GPAD to .ace File

Source File Location

GO_Annotation IDs - finding the right number from which to start

  • Will need to handle similarly to the OA annotations in that we'll need to consult .ace GO annotation files sequentially to get the correct starting number for the GO_annotation objects generated from this GPAD file.
    • See code at lines 45-49 of /home/postgres/work/citace_upload/go_curation/get_go_annotation_ace.pm
 my $annotCounter = 0;
 my $annotFile = '/home/acedb/kimberly/citace_upload/go/gp_annotation.ace';
 open (IN, "<$annotFile") or die "Cannot open $annotFile : $!";
 while (my $line = <IN>) { if ($line =~ m/GO_annotation : "(\d+)"/) { $annotCounter = $1; } }
 close (IN) or die "Cannot close $annotFile : $!";
  • In this case, I would move the go_annotation.ace file to the go/ directory as I do for the gp_annotation.ace file above and we would start from the next highest number after consulting the go_annotation.ace file.

SynGO Annotations

  • Proposal: skip all annotations that are Assigned_by SynGO and get these from UniProt-GOA because the evidence codes are already mapped for us for GAF.
    • The reason this matters is that SynGO uses evidence codes other than the three-letter codes and so their evidence codes need to be mapped to the traditional three-letter codes.
    • BUT….also need to double-check that all of the SynGO annotations in Noctua GPAD are also in Protein2GO and confirm with Tony, Alex, Dustin, Paul, etc. how new SynGO annotations will enter the pipeline.
    • In Protein2GO, SynGO annotations have Source 'SYNX'.
    • Application of evidence codes looks to be different from GO guidelines.

Columns and Mappings

Column 1: DB Prefix

  • Ignore

Column 2: DB Object ID

  • Populate Gene field in ?GO_annotation (no conversion needed)

Column 3: Relation

  • Populate Annotation_relation field with RO identifier
    • For now, need to map the text string to an RO identifier (no NOT annotations here, yet)
    • Subroutine exists in goa gpad parsing script, but we need to add all of the relations
    • And check on how NOT is handled
 sub populateAnnotrelToRo {
 $annotToRo{'colocalizes_with'} = 'RO:0002325';
 $annotToRo{'contributes_to'}   = 'RO:0002326';
 $annotToRo{'enables'}          = 'RO:0002327';
 $annotToRo{'involved_in'}      = 'RO:0002331';
 $annotToRo{'part_of'}          = 'BFO:0000050';
 $annotToRo{'acts_upstream_of_or_within'}   = 'RO:0002264';
 $annotToRo{'acts_upstream_of'}   = 'RO_0002263';
 $annotToRo{'acts_upstream_of_or_within_negative_effect'}   = 'RO:0004033';
 $annotToRo{'acts_upstream_of_or_within_positive_effect'}   = 'RO:0004032';
 $annotToRo{'acts_upstream_of_negative_effect'}   = 'RO:0004035';
 $annotToRo{'acts_upstream_of_positive_effect'}   = 'RO:0004034';
 } # sub populateAnnotrelToRo

Column 4: GO ID

  • GO_term (no conversion needed)

Column 5: Reference

  • Convert PMID to WBPaper
  • Convert DOI to WBPaper
  • Populate GO_REF as usual
  • PAINT_REF - convert to appropriate WBPaper (but why do we have annotations with PAINT_REF and are they in the GO database somewhere? They need to be updated in the GO database.)
 See lines 217 - 228 in /home/acedb/kimberly/citace_upload/go/gpad2ace/2018_October/go_gpad_parser.pl

Column 6: Evidence Code

  • Convert ECO ID to 3-letter GO code
    • This part will probably need to be written new as Tony maps the ECO to GO code in an annotation properties field, but this GPAD currently doesn't have that.
 ECO:0000250 ISS
 ECO:0000270 IEP
 ECO:0000304 TAS
 ECO:0000307 ND
 ECO:0000314 IDA
 ECO:0000315 IMP
 ECO:0000316 IGI
 ECO:0000318 IBA
 ECO:0000352 ???
 ECO:0000353 IPI
 ECO:0000501 IEA
 ECO:0005611 IDA (but check these as ECO definition is not quite clear/familiar)
 ECO:0007007 HEP
 ECO:0007293 IEA (also check this one as it is: experimental evidence used in automatic assertion)

Column 7: With/From

  • What’s here from go_gpad parser:

UniProtKB UniProtKB-KW UniProtKB-SubCell: UniPathway UniRule WB:WBGene WB:WBVar InterPro EC PANTHER PomBase SGD UniProt (this needs to be fixed in our GAF!! - see ‘Modification to our GAF file output’ email thread with Kevin Howe beginning on 2018-07-17; should be okay in WS267)

Column 8: Interacting taxon

  • Currently don’t have any entries for this, but could reuse code from go_gpad parser

Column 9: Date

  • YYYYMMDD - (no conversion needed)

Column 10: Assigned_by

  • WB
    • (SynGO - Need to upload an ?Analysis object that has the relevant details - file done 2018-10-23)

Column 11: Annotation Extensions

  • Parse as for go_gpad parsing script
    • Note that these are all RO relations
      • Also, SynGO is using UBERON for occurs_in and evidence codes like ‘knockout’ that don’t make sense for the papers they’re curating.

Column 12: Annotation Properties

  • Nothing to convert here.
  • At some point, we may want to think about capturing some of this metadata, but will need thorough review first.
  • Check curator; I’m still not sure we’re handling this correctly when we import existing annotations into GO-CAM models.