Difference between revisions of "Phenotype2GO Analysis"

From WormBaseWiki
Jump to navigationJump to search
 
(20 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
! Gene Name !! Phenotype2GO Annotation !! Evidence Code !! Experiment !! Reference !! Manual GO Annotation !! Evidence Code !! Comment !! Recommend Keep Phenotype2GO Annotation? || Class of P2GO
 
! Gene Name !! Phenotype2GO Annotation !! Evidence Code !! Experiment !! Reference !! Manual GO Annotation !! Evidence Code !! Comment !! Recommend Keep Phenotype2GO Annotation? || Class of P2GO
 
|-
 
|-
| hsf-1 || locomotion (GO:0040011) || IMP || RNAi in rrf-3 || 6395, Simmer et al., 2003, others || Nothing related to locomotion || || Reflects role in regulating protein homeostasis? || No || Possible downstream effect
+
| hsf-1 || locomotion (GO:0040011) || IMP || RNAi in rrf-3 || 6395, Simmer et al., 2003, others || No manual annotations from 6395 || || May reflect role in regulating transcription - what are the targets? No manual annotations to locomotion-related terms || No || Possible downstream effect
 
|-
 
|-
| met-1 || locomotion (GO:0040011) || IMP || RNAi || 26635, Gottschalk, et al., 2005 ||  no manual GO annotation || || role of met-1 not clear from this paper, but as a histone methyltransferase probably related to gene expression; contrast physiological process vs developmental process || No || Possible downstream effect
+
| met-1 || locomotion (GO:0040011) || IMP || RNAi || 26635, Gottschalk, et al., 2005 ||  No manual GO annotation from 26635 || || Role of met-1 not clear from this paper, but as a histone methyltransferase probably related to gene expression; manual annotations exist to cellular level processes and one vulval development annotation || No || Possible downstream effect
 
|-
 
|-
| sax-3 || locomotion (GO:0040011) || IMP || RNAi || 5654, Kamath et al., 2003 || axon guidance, neuron migration || || sax-3 plays a role in neuron and muscle migration guidance || No || Downstream effect
+
| sax-3 || locomotion (GO:0040011) || IMP || RNAi || 5654, Kamath et al., 2003 || No manual GO annotation from 5654 || || sax-3 plays a role in neuron and muscle migration guidance, manual annotations exist to axon guidance, neuron migration || No || Downstream effect
 
|-
 
|-
| skn-1 || embryonic development (GO:0009790) || IMP || RNAi || 28492, Maduaro et al., 2007 || endodermal cell fate specification || IMP || P2GO annotation is correct, but less granular than manual annotation || Not necessary - manual exists || High-level term
+
| skn-1 || embryonic development (GO:0009790) || IMP || RNAi || 28492, Maduro et al., 2007 || endodermal cell fate specification || IMP || P2GO annotation is correct, but less granular than manual annotation || Not necessary - manual exists || High-level term
 
|-
 
|-
| smg-8 || locomotion (GO:0040011) || IMP || RNAi || 37111 || nuclear-transcribed mRNA catabolic process, nonsense-mediated decay (GO:0000184) || IGI || smg-8 encodes a novel protein; manual annotation still needs to be added; affect on locomotion apparently a result of unc-54 transcript stabilization || No || Phenotype used as output for more general cellular process; incorrect evidence code
+
| smg-8 || locomotion (GO:0040011) || IMP || RNAi || 37111, Izumi et al., 2010 || No manual annotation from 37111 || || smg-8 encodes a novel protein; only CC annotations exist || No || Downstream effect: phenotype used as output for more general cellular process: effect on locomotion apparently a result of unc-54 transcript stabilization; also incorrect evidence code
 
|-
 
|-
| mes-4 || determination of adult lifespan (GO:0008430) || IMP || RNAi || 33449 || no GO annotation here?  expt about germline fates ||
+
| mes-4 || determination of adult lifespan (GO:0008430) || IMP || RNAi || 33449, Curran et al., 2009 || No manual GO annotation for mes-4 from 33449 || || Affect on lifespan may be due to role in regulation of germline transcription, germ cell survival || No || Possible downstream effect - MES-4 is required for germ cell fate specification
 
|-
 
|-
 
| gld-3 || apoptosis (obsolete) || IMP || RNAi || 38381 || maybe a germ cell development annotation? || IMP
 
| gld-3 || apoptosis (obsolete) || IMP || RNAi || 38381 || maybe a germ cell development annotation? || IMP
Line 24: Line 24:
 
|}
 
|}
  
==Strategy moving forward==
+
*Note: at least one paper, WBPaper00034662 is included in the rnai.go annotations but those mappings were made in close association between Gary and me, so the annotations for that paper should be kept.
 +
 
 +
==Goals and strategy moving forward==
 +
#Remove any annotation redundancy in the pipeline, i.e., papers have been curated both manually and via automated pipeline
 +
#Determine how many genes ''only'' have Phenotype2GO-based GO annotations - this will give us an idea of what we'd lose if we pull the Phenotype2GO annotations
 +
#Based on #2, devise a strategy for going forward, i.e. remove all automated annotations, keep some but remove most egregious mapping and change EC to IEA, put non-redundant, but automated IMP or IEA annotations in separate folder in GO (follow-up with CM on this), prioritize genes for manual curation, any combination of the above or other ideas?
 +
#Stop Phenotype2GO pipeline for now until we get a handle on if, and how, we can better incorporate such annotations - one thought if we wanted to keep any would be to move the pipeline to Caltech, i.e. perform the mappings for new annotations on a regular basis at Caltech and then upload those annotations in bulk to Protein2GO.
 +
 
 +
 
 
From the Phenotype2GO (P2GO) annotation file we need the following information:
 
From the Phenotype2GO (P2GO) annotation file we need the following information:
 
*How many unique genes have P2GO annotations
 
*How many unique genes have P2GO annotations
*How many of these genes have a manual or IEA (Process) annotation
+
*How many of these genes have a manual (WB or UniProtKB) or IEA Biological Process annotation
 
*How many genes have only P2GO annotations (will be obtained from the above)
 
*How many genes have only P2GO annotations (will be obtained from the above)
 +
 +
*Input files:
 +
**manual.go
 +
**electronic.go
 +
**rnai.go
 +
**variation???
 +
 +
*Rough specs for some scripts to help generate numbers and hopefully strategy
 +
**From rnai.go file generate list of unique WBGene IDs
 +
**Using list of unique WBGene IDs from rnai.go, check manual.go file for overlapping WBGenes with a P in Column 9
 +
**Using list of unique WBGene IDs from rnai.go, check electronic.go file for overlapping WBGenes with a P in Column 9
 +
**For each file checked, output the list of WBGene IDs that overlap and the remainder, i.e, the list of genes that DO NOT overlap
 +
**Compare each output list and generate:
 +
***List of genes with rnai.go but no annotations from manual.go and electronic.go files (intersection of two DO NOT overlap lists above)
 +
***List of genes with rnai.go but no annotations from either manual.go or electronic.go (i.e., no manual, but electronic)
 +
**Sort list of WBGenes in each file based on number of unique phenotype associations in phenotype gaf, descending order - this file would list the WBGene ID and the number of Phenotypes
 +
***This will give us a list of genes sorted according to their 'phenotype' information content and might help prioritize future curation
 +
 +
*Do the same thing with the UniProtKB annotation file?
 +
**external groups, i.e. UniProtKB??? Need to convert UniProtKB ids to WBGene IDs??
 +
**Using list of unique WBGene IDs from rnai.go, check uniprotkb.go file for overlapping WBGenes with a P in Column 9
 +
 +
 +
Back to [[Gene Ontology]]

Latest revision as of 21:25, 24 June 2013

Named Genes with Phenotype2GO-Based Annotations

Gene Name Phenotype2GO Annotation Evidence Code Experiment Reference Manual GO Annotation Evidence Code Comment Recommend Keep Phenotype2GO Annotation? Class of P2GO
hsf-1 locomotion (GO:0040011) IMP RNAi in rrf-3 6395, Simmer et al., 2003, others No manual annotations from 6395 May reflect role in regulating transcription - what are the targets? No manual annotations to locomotion-related terms No Possible downstream effect
met-1 locomotion (GO:0040011) IMP RNAi 26635, Gottschalk, et al., 2005 No manual GO annotation from 26635 Role of met-1 not clear from this paper, but as a histone methyltransferase probably related to gene expression; manual annotations exist to cellular level processes and one vulval development annotation No Possible downstream effect
sax-3 locomotion (GO:0040011) IMP RNAi 5654, Kamath et al., 2003 No manual GO annotation from 5654 sax-3 plays a role in neuron and muscle migration guidance, manual annotations exist to axon guidance, neuron migration No Downstream effect
skn-1 embryonic development (GO:0009790) IMP RNAi 28492, Maduro et al., 2007 endodermal cell fate specification IMP P2GO annotation is correct, but less granular than manual annotation Not necessary - manual exists High-level term
smg-8 locomotion (GO:0040011) IMP RNAi 37111, Izumi et al., 2010 No manual annotation from 37111 smg-8 encodes a novel protein; only CC annotations exist No Downstream effect: phenotype used as output for more general cellular process: effect on locomotion apparently a result of unc-54 transcript stabilization; also incorrect evidence code
mes-4 determination of adult lifespan (GO:0008430) IMP RNAi 33449, Curran et al., 2009 No manual GO annotation for mes-4 from 33449 Affect on lifespan may be due to role in regulation of germline transcription, germ cell survival No Possible downstream effect - MES-4 is required for germ cell fate specification
gld-3 apoptosis (obsolete) IMP RNAi 38381 maybe a germ cell development annotation? IMP
fem-3 embryonic development ending in birth or egg hatching (GO:0009792) IMP RNAi 5599 large-scale screen small-scale experiment supports this but is not yet annotated (35459)
  • Note: at least one paper, WBPaper00034662 is included in the rnai.go annotations but those mappings were made in close association between Gary and me, so the annotations for that paper should be kept.

Goals and strategy moving forward

  1. Remove any annotation redundancy in the pipeline, i.e., papers have been curated both manually and via automated pipeline
  2. Determine how many genes only have Phenotype2GO-based GO annotations - this will give us an idea of what we'd lose if we pull the Phenotype2GO annotations
  3. Based on #2, devise a strategy for going forward, i.e. remove all automated annotations, keep some but remove most egregious mapping and change EC to IEA, put non-redundant, but automated IMP or IEA annotations in separate folder in GO (follow-up with CM on this), prioritize genes for manual curation, any combination of the above or other ideas?
  4. Stop Phenotype2GO pipeline for now until we get a handle on if, and how, we can better incorporate such annotations - one thought if we wanted to keep any would be to move the pipeline to Caltech, i.e. perform the mappings for new annotations on a regular basis at Caltech and then upload those annotations in bulk to Protein2GO.


From the Phenotype2GO (P2GO) annotation file we need the following information:

  • How many unique genes have P2GO annotations
  • How many of these genes have a manual (WB or UniProtKB) or IEA Biological Process annotation
  • How many genes have only P2GO annotations (will be obtained from the above)
  • Input files:
    • manual.go
    • electronic.go
    • rnai.go
    • variation???
  • Rough specs for some scripts to help generate numbers and hopefully strategy
    • From rnai.go file generate list of unique WBGene IDs
    • Using list of unique WBGene IDs from rnai.go, check manual.go file for overlapping WBGenes with a P in Column 9
    • Using list of unique WBGene IDs from rnai.go, check electronic.go file for overlapping WBGenes with a P in Column 9
    • For each file checked, output the list of WBGene IDs that overlap and the remainder, i.e, the list of genes that DO NOT overlap
    • Compare each output list and generate:
      • List of genes with rnai.go but no annotations from manual.go and electronic.go files (intersection of two DO NOT overlap lists above)
      • List of genes with rnai.go but no annotations from either manual.go or electronic.go (i.e., no manual, but electronic)
    • Sort list of WBGenes in each file based on number of unique phenotype associations in phenotype gaf, descending order - this file would list the WBGene ID and the number of Phenotypes
      • This will give us a list of genes sorted according to their 'phenotype' information content and might help prioritize future curation
  • Do the same thing with the UniProtKB annotation file?
    • external groups, i.e. UniProtKB??? Need to convert UniProtKB ids to WBGene IDs??
    • Using list of unique WBGene IDs from rnai.go, check uniprotkb.go file for overlapping WBGenes with a P in Column 9


Back to Gene Ontology