Gene Ontology

From WormBaseWiki
Jump to navigationJump to search

Contents

Manual Literature Curation

Reference Genome (see also Reference Genome Inferential Annotations)

  1. Lung Development Targets (November 2009 - February 2010)

Transcription-related re-annotation

This summarizes the annotations that may need to be revised due to changes in the GO's representation of transcription.

7 Molecular Function terms will be obsoleted. They are listed below with the number of manual elegans annotations associated:

  • GO:0003704 specific RNA polymerase II transcription factor activity - 8 (all Kimberly)
 ceh-24 - ISS - changed to  GO:0000981 sequence-specific DNA binding RNA polymerase II transcription factor activity 
 ceh-27 - ISS - same as above
 ceh-28 - ISS - same as above
 elt-1 - IDA - same as above
 elt-3 - IMP - for WBPaper00004593, removed MF term (no longer comfortable with IMP MF terms from this type of experiment); 
   also made corresponding BP term less granular, from positive regulation to just gene-specific transcription from pol II promoter
 elt-3 - IMP - same as for elt-3 above
 hlh-3 - IMP - for WBPaper00031977, removed MF term for same reason as above, also made BP term less granular as above
 zip-2 - IMP - for WBPaper00035891, same as above for elt-3 and hlh-3

Migration to UniProtKB Protein2GO Curation Tool

  1. UniProt-GOA syntax checking
  2. File Specifications for Downloading Manual Annotations for Protein2GO
  3. GAF to .ace file

Semi-Automated Methods of Curation

Textpresso-Based Curation

GO Cellular Component Curation - MOD-Specific Pages
General specifications
dictyBase
FlyBase
TAIR (this is the older page no longer used)
TAIR_CCC
WormBase
GO Cellular Component Curation - General Issues
Processing Gene and Protein Names for Searches and Curation
Specifications for CCC Curation from Textpresso Search Page
CCC Form 2.0 Specifications
  • MFC - GO Molecular Function Curation using Textpresso
mf_hmm tool
in vitro flagging

Phenotype2GO pipeline (Sanger and Caltech)

  • The old Sanger script that generates the gene_association file (from Igor's work in January 2009) was changed. Instead of an exclusion list and 'include list' that comprises papers (mostly large scale genome-wide studies) is provided to the script. This list is curator approved and explicitly agreed upon for the propagation of GO terms to genes based on their RNAi phenotypes.
  • A new script is used, to use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
  • If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.
  • The old script: inherit_GO_terms.pl does not consult any exclusion/inclusion files. To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.
  • Current status:From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled. The script should be run without the '-variation' option, but the gene_association file still has those. Try this:

grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi This is now resolved.

InterPro2GO Mappings for IEA Annotations
Reference Genome Inferential Annotations

Software Development: Tools and Scripts

Reference Genome Reports - Annotation Coverage

Ontology Annotator - The GO annotation interface

Textpresso related forms

WormBase gene association file

Taxon Constraints

From Chris Mungall, 8/19/2011:

The taxon checks are run weekly, and the reports deposited here:

   http://www.geneontology.org/quality_control/annotation_checks/taxon_checks/

Note that this service will be subsumed into a more comprehensive annotation QC service (apologies if you weren't at the USC meeting, where this was demoed). This is, in general, the plan for many of the ad-hoc scripts and cron reports we perform now. I will send an email to the GOC list next week describing the roll-out process for this.

For the QC checks, the idea is to push the checking as far upstream as possible. A weekly report is too reactive. This could be done at the time of submission. Even better, the annotation tool could use the central web service at the time of annotation.

WormBase contributions to Gene Ontology content

2013
  • early endosome to recycling endosome transport
  • corrected definition of recycling endosome
2012
  • nematode larval development, heterochronic
  • regulation of nematode larval development, heterochronic
  • transforming growth factor receptor signaling pathway involved in multicellular organism growth
  • insulin receptor signaling pathway involved in determination of adult lifespan
  • positive, negative regulation of oviposition
  • pairing center
  • muscle projection, muscle projection membrane (narrow synonyms: myopodia, muscle arm)
  • regulation of synaptic plasticity by receptor localization to synapse
  • regulation of basement membrane organization
  • regulation of RNA interference
  • regulation, positive, negative of oocyte maturation
  • incorrect InterPro2GO mapping for IPR003131
  • dishabituation
  • double-stranded DNA-dependent ATPase activity
  • new representation of tail tip morphogenesis
  • synonym for nuclear inner membrane
  • change definition of apical junction complex
  • regulation, positive, negative of serine-type endopeptidase activity
  • regulation, positive, negative of neuromuscular synaptic transmission
2011
  • protein binding - SUMO conjugating enzyme
  • regulation of neuron migration
  • basement membrane assembly involved in embryonic body morphogenesis
  • parentage of dauer larval development - also include dormancy process
  • regulation of ATP biosynthetic process
  • regulation, positive, negative of dipeptide transport
  • regulation of phospholipid transport
  • regulation, positive, negative of endocytic recyling
  • suggested change to InterPro2GO mapping for GoLoco motif
  • GABAergic neuron differentiation
  • pre-mRNA binding
  • nitric oxide sensory activity
  • age-dependent behavioral decline
  • regulation, positive, negative of anterograde axon cargo transport and retrograde axon cargo transport
  • aggrephagy
  • germ cell proliferation
  • in progress - centrosome maturation - when, what, how
  • defecation motor program
  • modifications to terms and definitions of cilium assembly and sensory cilium assembly
  • ciliary transition zone
  • regulation and pos/neg regulation of microtubule motor activity
  • nickel ion homeostasis and cellular nickel ion homeostasis
  • neurotransmitter receptor catabolic process
2010
  • regulation of defecation, positive and negative children (2010)
  • mitochondrial prohibitin complex (2010)
  • cilium terms (2010, updates/revisions to terms added in 2005)
  • octapamine/tyramine signaling involved in the response to food (and the regulation terms) (2010)
  • alpha-tubulin acetylation (2010)
  • phagosome maturation involved in apoptotic cell clearance (2010)
  • phagosome acidification involved in apoptotic cell clearance(2010)
  • phagolysosome assembly involved in apoptotic cell clearance (2010)
  • phagosome-lysosome docking involved in apoptotic cell clearance (2010
  • phagosome-lysosome fusion involved in apoptotic cell clearance (2010)
  • neuropeptide receptor binding (2010)
  • striated muscle contraction involved in embryonic body morphogenesis (2010)
  • striated muscle myosin thick filament assembly (2010)
  • striated muscle paramyosin thick filament assembly (2010)
  • determination of left/right asymmetry in the nervous system (2010)
  • regulation of locomotion (including positive and negative regulation child terms) involved in locomotory behavior (2010)
  • detoxification of arsenic (2010)
  • chondroitin sulfate proteoglycan binding (2010)
  • chondroitin sulfate binding (2010)
  • regulation (includes positive and negative regulation child terms) of nematode larval development (2010)
  • regulation of (includes positive and negative regulation terms) dauer larval development (2010)
2009
  • response to drug withdrawel (2009)
  • phosphatidylserine exposure on apoptotic cell surface (2009)
2008
  • regulation of synaptic vesicle priming (2008)
  • chloride-activated potassium channel activity (2008)
  • transdifferentiation (2008)
  • Regulation of ovulation terms (2008)
  • Process terms for gap junction proteins (2008)
  • piRNA and 21U-RNA terms (2008)
2007
  • dense body (sensu Nematoda) cellular component term (2007)
  • GO:0000775, GO:0000779, GO:0000780
  • D/V and A/P axon guidance terms (2007)
  • palmitoyl-CoA 9-desaturase activity (2007)
  • response to hyperoxia (2007)
  • Cuticle component terms (2007)
  • response to anoxia (2007)
2006
  • dynein light intermediate chain binding (2006)
  • Regulation terms for cell and nuclear division (2006)
  • Several child terms for apoptosis (2006)
2005
  • Cilium terms (2005)
2004
  • Intraflagellar transport particle-component terms (2004)
  • oogenesis (non-species specific term)(2004)
Modifications to the Ontology
  • Revised definition for muscle homeostasis (2010)
  • Added dense core vesicle synonym to dense core granule (2010)
  • Updated definition and moved parentage for intraflagellar transport (2009)
  • Added lethargus as synonym for sleep (2008)
  • Change to the definitions of the component terms: GO:0000775, GO:0000779, GO:0000780 which refer to the centromeres or chromosome, pericentric region (2007)
  • Change to parent of tail tip morphogenesis (sensu Nematoda) (2006)
  • GO:0046536, dosage compensation complex definition (2006)

Annotation Practices

Cellular Component Annotations

If a protein contains a transmembrane domain, but expression experiments are not at sufficient resolution to show membrane localization, what annotation should we make?

Example: WBPaper00036024


WormBase use of Column 16

Column 16 refers to a column in the Gene Ontology's (GO) tab-delimited gene association file (gaf) that WormBase submits to the GO consortium on a regular basis.

Column 16 has been referred to as the Annotation Extension column in that it provides a placeholder for curation details that cannot be captured by a GO term alone, for example the substrate upon which an enzyme acts.

A number of different types of information could conceivably be entered into Column 16. The list below begins to document the potential use of Column 16 by WormBase curators with any additional information or questions that have arisen during the course of curation.

In the GAF, there will be an explicit relationship between the entity in Column 16 and the GO term. The annotation extension relations are viewable here:

http://www.geneontology.org/scratch/xps/go_annotation_extension_relations.obo

Column 16 curation at WormBase is just beginning and will likely be fleshed out more fully over the next few months.

In the Ontology Annotator, Column 16 data is being entered into the 'Xref to' field in the following format: Column 16: Xref ID


Biological Process Examples:


Translational Regulation

Example 1: sup-26 is annotated to GO:0017148, negative regulation of translation. The entry in Column 16 is the target of that regulation, tra-2.

In OA entry: Column 16: WB:WBGene00006605

[Typedef]

id: has_regulation_target

name: has_regulation_target

def: "Identifies a gene or gene product affected by a regulation BP or regulator MF." [GOC:mah]

comment: probably want to add one or two new subtypes that capture something about directness

domain: GO:0065007 ! biological regulation

range: TEMP:0000003 ! gene or gene product

is_a: OBO_REL:has_participant


mRNA Processing

Example: sup-12 mutations affect splicing of unc-60 transcripts.

Reference: WBPaper00024604


Defense Response

Example 1: lys-7 is required for defense response to Cryptococcus neoformans

In OA entry: Column 16: NCBI:192011 (a taxon ID)

Response to Terms

Example 1: daf-2 is shown to be involved in response to oxidative stress by treating animals with paraquat. WBPaper00005488

In OA entry: annotate to 'response to oxidative stress' using CHEBI:34905

Cell Fate Specification

Example 1: egl-38 is found to be required for cell fate specification in the male tail. WBPaper00002924

Could add a number of Anatomy Terms to Column 16 (not done yet).

Regulation of Protein Localization

Example 1: hmp-1 and jac-1;hmp-1 double mutants are shown to affect the distribution of HMR-1. WBPaper00005972

Added WBGene ID of HMR-1 to Column 16.


Molecular Function Examples:

Nucleic Acid Binding

Example 1: sup-26 is annotated to GO:0003730, mRNA 3'-UTR binding. The entry in Column 16 is the target of that binding, tra-2.

In OA entry: Column 16: WB:WBGene00006605

Plans/Projects in progress

?GO_annotation model

  • The GO annotation model is becomingly increasingly complex, capturing more annotation detail.
  • The proposed model below would create a #GO_annotation_info hash.
    • 'With' or 'From', for the use of additional identifiers with the use of certain evidence codes like IPI, IGI, etc.
    • Annotation Extension for capturing cross references to other ontologies
    • Qualifying an annotation with the GO qualifiers 'not' 'contributes_to' or 'colocalizes with'
    • Uses the Database tag to capture GO_REF IDs for annotations that do not use a published paper, but rather a documented GO curation practice (used for ND annotations, for example see GO References
    • If an experiment describes the activity of a specific protein isoform, this would be captured in the Annotated_isoform tag with a UniProtKB accession.

Proposed new ?GO_annotation_info hash:

    ?Gene GO_term ?GO_term XREF Gene Evidence_code ?Evidence_code #GO_annotation_info   
                   
                   
    ?GO_annotation_info  Annotation_made_with_from ?Gene
                                                   ?Motif
                                                   ?RNAi
                                                   ?Variation
                                                   ?Text
                         Annotation_extension Relationships happens_during        ?Life_stage
                                                            has_direct_input      ?Gene
                                                            has_direct_input      Database ?Database ?Database_field ?Accession_number
                                                            has_regulation_target ?Gene
                                                            occurs_in             ?Anatomy_term
                                                            part_of               ?Anatomy_term
                         Annotation_qualifier not
                                              colocalizes_with
                                              contributes_to
                         Annotated_isoform   Database  ?Database  ?Database_field  ?Accession_number  //e.g., UniProtKB:Q9N5D6-1 
                         Database ?Database  ?Database_field  ?Accession_number  //e.g., GO_REF:0000015
                         Reference ?Paper
                         Curator_confirmed ?Person
                         Date_last_updated UNIQUE DateType

?GO_term model

Proposal for new ?GO_term model:

    ?GO_term Name UNIQUE ?Text
             Status UNIQUE Valid
                           Obsolete
             Namespace UNIQUE Biological_process
                              Cellular_component
                              Molecular_function
             Alternate_id ?Text
             Definition UNIQUE ?Text
             Comment Text
             Synonym Broad ?Text
                     Exact ?Text
                     Narrow ?Text
                     Related ?Text
             Parent Is_a          ?GO_term XREF Is     //Except for the root terms, all terms should have this
             Child  Is            ?GO_term XREF Is_a
             Part_of              ?GO_term
             Regulates            ?GO_term
             Negatively_regulates ?GO_term
             Positively_regulates ?GO_term
             Has_part             ?GO_term
             Intersection_of ?GO_term     //For specifying origin of cross-products.
                             ?Text        //This will be a relation ontology term and external ontology term.  Eventually articulate?
             Consider ?GO_term            //Gives a term which may be an appropriate substitute for an obsolete term.
             Replaced_by ?GO_term         //Gives a term which replaces an obsolete term.        
             Attribute_of ?Motif                XREF GO_term   //Needed for InterPro2GO mapping.
                          ?Gene                 XREF GO_term   //Annotated entity for manual annotations.
                          ?CDS                  XREF GO_term   //Annotated entity for InterPro2GO and Phenotype2GO mappings.
                          ?Sequence             XREF GO_term   //Still needed?
                          ?Transcript           XREF GO_term   //Still needed?
                          ?Phenotype            XREF GO_term
                          ?Anatomy_term         XREF GO_term   
                          ?Homology_group       XREF GO_term   //Still needed?
                          ?Expr_pattern         XREF GO_term
                          ?Expression_cluster   XREF GO_term
                          ?Picture              XREF Cellular_component
                          ?WBProcess            XREF GO_term
             Index Ancestor   ?GO_term      //Consider transitivity.  Is this what's used for web display?
                   Descendent ?GO_term      //Consider transitivity.  Is this what's used for web display?
             Version UNIQUE Text            //SVN revision number

Evidence Code Ontology Model

Proposal for a new ?Evidence_code object:

    ?Evidence_code  Name ?Text
                    Status UNIQUE Valid
                                  Obsolete
                    Namespace ?Text
                    Alternate_id ?Text
                    Definition ?Text
                    Comment Text
                    Synonym ?Text Scope_modifier UNIQUE Broad
                                                        Exact
                                                        Narrow
                                                        Related
                    DB_info    Database  ?Database  ?Database_field   ?Accession_number  Text  ##PSI-MI, GO_REF, GOECO
                    Relationships is_a   ?Evidence_code      //Except for the root terms, all terms should have this
                    Intersection_of ?GO_term 
                                    ?Text  //This will be a relation ontology term and external ontology term. Eventually articulate?  
                    Created_by Text
                    Creation_date Text
                    Version UNIQUE Text

Progress Report 2011

Papers that use C. elegans GO Annotations

  • WBPaper00035429
    • The resulting fold-changes and p-values were then used for GOMiner [36] and Cytoscape [37,38] analyses.
    • Detailed descriptions of GOMiner and Cytoscape analyses are accessible as Supplemental methods.
    • Detailed GOMiner and Cytoscape (jActiveModules and BiNGO) analyses of strain-to-strain differences under both UV and control conditions can be found in Supplemental data files 2–5 (GOMiner) and 6–7 (BiNGO).
    • Differentially expressed transcripts (defined as absolute fold change value > 1.3, a log ratio p-value < 0.05 by Rosetta Resolver, and a log(10) intensity measurement > −0.4) along with fold-change, p-values and GO annotation are listed for each strain and year in Supplemental data file 8.
    • 3.9. Gene ontology (GO) biological processes altered 3 h post-UVC exposure
    • Since our list of UVC-regulated genes based on a minimal 1.3-fold change was relatively short (Table 1), we also carried out a jActiveModules network analysis. This algorithm can identify subnetworks highly enriched in regulated genes even if the fold-changes are not large, since it is based only on p-values [37].
    • 36. Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, Narasimhan S, Kane DW, Reinhold WC, Lababidi S, Bussey KJ, Riss J, Barrett JC, Weinstein JN. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol. 2003;4:R28. [PMC free article] [PubMed]
    • 37. Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002;18 Suppl 1:S233–S240. [PubMed]
  • WBPaper00040880 - The 662 genes associated with the modification of 128Q-neuron dysfunction appeared to encompass a variety of biological processes (cell death, protein folding, intracellular transport, metabolic processes, response to stress, stress-activated pathways) that may have a role in neurodegenerative disease pathogenesis as suggested by their functional classification using GO annotations (Figure 3, Additional file 7: Tables S6; Additional file 8: Table S7).
    • GO enrichment tests were performed using Ontologizer v2.0.
    • Bauer S, Grossmann S, Vingron M, Robinson PN : Ontologizer 2.0--a multifunctional tool for GO term enrichment analysis and data exploration.
    • In this respect, the network-boosted data analysis of our RNAi dataset was more instructive compared to the sole use of GO annotations or gene set enrichment analysis.
  • WBPaper00040998
    • To identify pathways and molecular functions common to the genes observed by microarray analysis, we employed the gene ontology (GO) enrichment analysis using GOrilla [21]. As expected due to NHR-49’s known role in lipid biology, there was a significant overrepresentation of GO-terms for functions related to fat metabolism (Figure 2 and Table 2). We also found that pathways regulating protein processing, maturation and proteolysis were overrepresented.
    • Figure 2. Functional classification summary for the nhr-49 mutant are represented as a scatter plot using the GO visualization tool REViGO.
    • Gene ontology (GO) enrichment analysis was performed using GOrilla [21]. Each list from the limma analysis was ranked from smallest to largest p-value and analyzed for enriched biological process ontology terms found near the top of the list. Functional classification summary for the nhr-49 mutant were presented as a scatter plot using the GO visualization tool REViGO [51].
    • Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z (2009) GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10: 48.
    • Supek F, Boˇsnjak M, Sˇkunca N, Sˇmuc T (2011) REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms. PLoS ONE 6: e21800. doi:10.1371/journal.pone.0021800.
  • WBPaper00041080
  • WBPaper00041771
    • Gene ontology classification was acquired using the Database for Annotation, Visualization, and Integrated Discovery (DAVID, http://david.abcc.ncifcrf.gov/).
    • DAVID identified many biological themes among our list of STAU-1 targets. These included embryonic, larval, and reproductive development (Fig. 5B and Supplemental Dataset 1). These are consistent with Staufen’s previously characterized role in developmental patterning in Drosophila oocytes and embryos (32,33,62).
    • In addition, major GO terms associated with the human Staufen targets include cellular metabolism and cellular processes (42), and are not similar to the GO terms associated with C. elegans STAU-1 targets. We note that our studies analyzed STAU-1- associated RNAs in whole animals containing a wide array of cell types, whereas the human proteins were analyzed in a cultured cell line. Thus the biological meaning of the apparent differences in the targets of the human and worm proteins is uncertain.












Back to Caltech documentation