Phenotype2GO Analysis

From WormBaseWiki
Jump to: navigation, search

Named Genes with Phenotype2GO-Based Annotations

Gene Name Phenotype2GO Annotation Evidence Code Experiment Reference Manual GO Annotation Evidence Code Comment Recommend Keep Phenotype2GO Annotation? Class of P2GO
hsf-1 locomotion (GO:0040011) IMP RNAi in rrf-3 6395, Simmer et al., 2003, others No manual annotations from 6395 May reflect role in regulating transcription - what are the targets? No manual annotations to locomotion-related terms No Possible downstream effect
met-1 locomotion (GO:0040011) IMP RNAi 26635, Gottschalk, et al., 2005 No manual GO annotation from 26635 Role of met-1 not clear from this paper, but as a histone methyltransferase probably related to gene expression; manual annotations exist to cellular level processes and one vulval development annotation No Possible downstream effect
sax-3 locomotion (GO:0040011) IMP RNAi 5654, Kamath et al., 2003 No manual GO annotation from 5654 sax-3 plays a role in neuron and muscle migration guidance, manual annotations exist to axon guidance, neuron migration No Downstream effect
skn-1 embryonic development (GO:0009790) IMP RNAi 28492, Maduro et al., 2007 endodermal cell fate specification IMP P2GO annotation is correct, but less granular than manual annotation Not necessary - manual exists High-level term
smg-8 locomotion (GO:0040011) IMP RNAi 37111, Izumi et al., 2010 No manual annotation from 37111 smg-8 encodes a novel protein; only CC annotations exist No Downstream effect: phenotype used as output for more general cellular process: effect on locomotion apparently a result of unc-54 transcript stabilization; also incorrect evidence code
mes-4 determination of adult lifespan (GO:0008430) IMP RNAi 33449, Curran et al., 2009 No manual GO annotation for mes-4 from 33449 Affect on lifespan may be due to role in regulation of germline transcription, germ cell survival No Possible downstream effect - MES-4 is required for germ cell fate specification
gld-3 apoptosis (obsolete) IMP RNAi 38381 maybe a germ cell development annotation? IMP
fem-3 embryonic development ending in birth or egg hatching (GO:0009792) IMP RNAi 5599 large-scale screen small-scale experiment supports this but is not yet annotated (35459)
  • Note: at least one paper, WBPaper00034662 is included in the rnai.go annotations but those mappings were made in close association between Gary and me, so the annotations for that paper should be kept.

Goals and strategy moving forward

  1. Remove any annotation redundancy in the pipeline, i.e., papers have been curated both manually and via automated pipeline
  2. Determine how many genes only have Phenotype2GO-based GO annotations - this will give us an idea of what we'd lose if we pull the Phenotype2GO annotations
  3. Based on #2, devise a strategy for going forward, i.e. remove all automated annotations, keep some but remove most egregious mapping and change EC to IEA, put non-redundant, but automated IMP or IEA annotations in separate folder in GO (follow-up with CM on this), prioritize genes for manual curation, any combination of the above or other ideas?
  4. Stop Phenotype2GO pipeline for now until we get a handle on if, and how, we can better incorporate such annotations - one thought if we wanted to keep any would be to move the pipeline to Caltech, i.e. perform the mappings for new annotations on a regular basis at Caltech and then upload those annotations in bulk to Protein2GO.


From the Phenotype2GO (P2GO) annotation file we need the following information:

  • How many unique genes have P2GO annotations
  • How many of these genes have a manual (WB or UniProtKB) or IEA Biological Process annotation
  • How many genes have only P2GO annotations (will be obtained from the above)
  • Input files:
    • manual.go
    • electronic.go
    • rnai.go
    • variation???
  • Rough specs for some scripts to help generate numbers and hopefully strategy
    • From rnai.go file generate list of unique WBGene IDs
    • Using list of unique WBGene IDs from rnai.go, check manual.go file for overlapping WBGenes with a P in Column 9
    • Using list of unique WBGene IDs from rnai.go, check electronic.go file for overlapping WBGenes with a P in Column 9
    • For each file checked, output the list of WBGene IDs that overlap and the remainder, i.e, the list of genes that DO NOT overlap
    • Compare each output list and generate:
      • List of genes with rnai.go but no annotations from manual.go and electronic.go files (intersection of two DO NOT overlap lists above)
      • List of genes with rnai.go but no annotations from either manual.go or electronic.go (i.e., no manual, but electronic)
    • Sort list of WBGenes in each file based on number of unique phenotype associations in phenotype gaf, descending order - this file would list the WBGene ID and the number of Phenotypes
      • This will give us a list of genes sorted according to their 'phenotype' information content and might help prioritize future curation
  • Do the same thing with the UniProtKB annotation file?
    • external groups, i.e. UniProtKB??? Need to convert UniProtKB ids to WBGene IDs??
    • Using list of unique WBGene IDs from rnai.go, check uniprotkb.go file for overlapping WBGenes with a P in Column 9


Back to Gene Ontology