Phenotype2GO Analysis
From WormBaseWiki
Revision as of 20:39, 20 June 2013 by Vanaukenk (talk | contribs) (→Goals and strategy moving forward)
Named Genes with Phenotype2GO-Based Annotations
Gene Name | Phenotype2GO Annotation | Evidence Code | Experiment | Reference | Manual GO Annotation | Evidence Code | Comment | Recommend Keep Phenotype2GO Annotation? | Class of P2GO |
---|---|---|---|---|---|---|---|---|---|
hsf-1 | locomotion (GO:0040011) | IMP | RNAi in rrf-3 | 6395, Simmer et al., 2003, others | No manual annotations from 6395 | May reflect role in regulating transcription - what are the targets? No manual annotations to locomotion-related terms | No | Possible downstream effect | |
met-1 | locomotion (GO:0040011) | IMP | RNAi | 26635, Gottschalk, et al., 2005 | No manual GO annotation from 26635 | Role of met-1 not clear from this paper, but as a histone methyltransferase probably related to gene expression; manual annotations exist to cellular level processes and one vulval development annotation | No | Possible downstream effect | |
sax-3 | locomotion (GO:0040011) | IMP | RNAi | 5654, Kamath et al., 2003 | No manual GO annotation from 5654 | sax-3 plays a role in neuron and muscle migration guidance, manual annotations exist to axon guidance, neuron migration | No | Downstream effect | |
skn-1 | embryonic development (GO:0009790) | IMP | RNAi | 28492, Maduro et al., 2007 | endodermal cell fate specification | IMP | P2GO annotation is correct, but less granular than manual annotation | Not necessary - manual exists | High-level term |
smg-8 | locomotion (GO:0040011) | IMP | RNAi | 37111, Izumi et al., 2010 | No manual annotation from 37111 | smg-8 encodes a novel protein; only CC annotations exist | No | Downstream effect: phenotype used as output for more general cellular process: effect on locomotion apparently a result of unc-54 transcript stabilization; also incorrect evidence code | |
mes-4 | determination of adult lifespan (GO:0008430) | IMP | RNAi | 33449, Curran et al., 2009 | No manual GO annotation for mes-4 from 33449 | Affect on lifespan may be due to role in regulation of germline transcription, germ cell survival | No | Possible downstream effect - MES-4 is required for germ cell fate specification | |
gld-3 | apoptosis (obsolete) | IMP | RNAi | 38381 | maybe a germ cell development annotation? | IMP | |||
fem-3 | embryonic development ending in birth or egg hatching (GO:0009792) | IMP | RNAi | 5599 | large-scale screen | small-scale experiment supports this but is not yet annotated (35459) |
- Note: at least one paper, WBPaper00034662 is included in the rnai.go annotations but those mappings were made in close association between Gary and me, so the annotations for that paper should be kept.
Goals and strategy moving forward
- Remove any annotation redundancy in the pipeline, i.e., papers have been curated both manually and via automated pipeline
- Determine how many genes only have Phenotype2GO-based GO annotations - this will give us an idea of what we'd lose if we pull the Phenotype2GO annotations
- Based on #2, devise a strategy for going forward, i.e. remove all automated annotations, keep some but remove most egregious mapping and change EC to IEA, put non-redundant, but automated IMP or IEA annotations in separate folder in GO (follow-up with CM on this), prioritize genes for manual curation, any combination of the above or other ideas?
From the Phenotype2GO (P2GO) annotation file we need the following information:
- How many unique genes have P2GO annotations
- How many of these genes have a manual (WB or UniProtKB) or IEA Biological Process annotation
- How many genes have only P2GO annotations (will be obtained from the above)
- Input files:
- manual.go
- electronic.go
- rnai.go
- variation???
- external groups, i.e. UniProtKB??? Need to convert UniProtKB ids to WBGene IDs??
- Rough specs for some scripts to help generate numbers and hopefully strategy
- From rnai.go file generate list of unique WBGene IDs
- Using list of unique WBGene IDs from rnai.go, check manual.go file for overlapping WBGenes with a P in Column 9
- Using list of unique WBGene IDs from rnai.go, check uniprotkb.go file for overlapping WBGenes with a P in Column 9
- Using list of unique WBGene IDs from rnai.go, check electronic.go file for overlapping WBGenes with a P in Column 9
- For each file checked, output the list of WBGene IDs that do NOT overlap
- Compare each output list and determine
- Genes with rnai.go but no other annotations from 3 other files
- Genes with rnai.go but no other annotations from each pair of files (i.e., no manual and no uniprotkb, but electronic)
- Sort list of WBGenes in each file based on number of unique phenotype associations in phenotype gaf, descending order
- This will give us a list of genes sorted according to their 'phenotype' information content and might help prioritize