Phenotype2GO Analysis
From WormBaseWiki
Jump to navigationJump to searchNamed Genes with Phenotype2GO-Based Annotations
Gene Name | Phenotype2GO Annotation | Evidence Code | Experiment | Reference | Manual GO Annotation | Evidence Code | Comment | Recommend Keep Phenotype2GO Annotation? | Class of P2GO |
---|---|---|---|---|---|---|---|---|---|
hsf-1 | locomotion (GO:0040011) | IMP | RNAi in rrf-3 | 6395, Simmer et al., 2003, others | No manual annotations from 6395 | May reflect role in regulating transcription - what are the targets? No manual annotations to locomotion-related terms | No | Possible downstream effect | |
met-1 | locomotion (GO:0040011) | IMP | RNAi | 26635, Gottschalk, et al., 2005 | No manual GO annotation from 26635 | Role of met-1 not clear from this paper, but as a histone methyltransferase probably related to gene expression; manual annotations exist to cellular level processes and one vulval development annotation | No | Possible downstream effect | |
sax-3 | locomotion (GO:0040011) | IMP | RNAi | 5654, Kamath et al., 2003 | No manual GO annotation from 5654 | sax-3 plays a role in neuron and muscle migration guidance, manual annotations exist to axon guidance, neuron migration | No | Downstream effect | |
skn-1 | embryonic development (GO:0009790) | IMP | RNAi | 28492, Maduro et al., 2007 | endodermal cell fate specification | IMP | P2GO annotation is correct, but less granular than manual annotation | Not necessary - manual exists | High-level term |
smg-8 | locomotion (GO:0040011) | IMP | RNAi | 37111, Izumi et al., 2010 | No manual annotation from 37111 | smg-8 encodes a novel protein; only CC annotations exist | No | Downstream effect: phenotype used as output for more general cellular process: effect on locomotion apparently a result of unc-54 transcript stabilization; also incorrect evidence code | |
mes-4 | determination of adult lifespan (GO:0008430) | IMP | RNAi | 33449, Curran et al., 2009 | No manual GO annotation for mes-4 from 33449 | Affect on lifespan may be due to role in regulation of germline transcription, germ cell survival | No | Possible downstream effect - MES-4 is required for germ cell fate specification | |
gld-3 | apoptosis (obsolete) | IMP | RNAi | 38381 | maybe a germ cell development annotation? | IMP | |||
fem-3 | embryonic development ending in birth or egg hatching (GO:0009792) | IMP | RNAi | 5599 | large-scale screen | small-scale experiment supports this but is not yet annotated (35459) |
- Note: at least one paper, WBPaper00034662 is included in the rnai.go annotations but those mappings were made in close association between Gary and me, so the annotations for that paper should be kept.
Goals and strategy moving forward
- Remove any annotation redundancy in the pipeline, i.e., papers have been curated both manually and via automated pipeline
- Determine how many genes only have Phenotype2GO-based GO annotations - this will give us an idea of what we'd lose if we pull the Phenotype2GO annotations
- Based on #2, devise a strategy for going forward, i.e. remove all automated annotations, keep some but remove most egregious mapping and change EC to IEA, put non-redundant, but automated IMP or IEA annotations in separate folder in GO (follow-up with CM on this), prioritize genes for manual curation, any combination of the above or other ideas?
- Stop Phenotype2GO pipeline for now until we get a handle on if, and how, we can better incorporate such annotations - one thought if we wanted to keep any would be to move the pipeline to Caltech, i.e. perform the mappings for new annotations on a regular basis at Caltech and then upload those annotations in bulk to Protein2GO.
From the Phenotype2GO (P2GO) annotation file we need the following information:
- How many unique genes have P2GO annotations
- How many of these genes have a manual (WB or UniProtKB) or IEA Biological Process annotation
- How many genes have only P2GO annotations (will be obtained from the above)
- Input files:
- manual.go
- electronic.go
- rnai.go
- variation???
- Rough specs for some scripts to help generate numbers and hopefully strategy
- From rnai.go file generate list of unique WBGene IDs
- Using list of unique WBGene IDs from rnai.go, check manual.go file for overlapping WBGenes with a P in Column 9
- Using list of unique WBGene IDs from rnai.go, check electronic.go file for overlapping WBGenes with a P in Column 9
- For each file checked, output the list of WBGene IDs that overlap and the remainder, i.e, the list of genes that DO NOT overlap
- Compare each output list and generate:
- List of genes with rnai.go but no annotations from manual.go and electronic.go files (intersection of two DO NOT overlap lists above)
- List of genes with rnai.go but no annotations from either manual.go or electronic.go (i.e., no manual, but electronic)
- Sort list of WBGenes in each file based on number of unique phenotype associations in phenotype gaf, descending order - this file would list the WBGene ID and the number of Phenotypes
- This will give us a list of genes sorted according to their 'phenotype' information content and might help prioritize future curation
- Do the same thing with the UniProtKB annotation file?
- external groups, i.e. UniProtKB??? Need to convert UniProtKB ids to WBGene IDs??
- Using list of unique WBGene IDs from rnai.go, check uniprotkb.go file for overlapping WBGenes with a P in Column 9
Back to Gene Ontology