Gene Ontology
Contents
- 1 Manual Literature Curation
- 2 Semi-Automated Methods of Curation
- 2.1 Textpresso-Based Curation
- 2.2 Phenotype2GO pipeline (Sanger and Caltech)
- 3 Software Developement: Tools and Scripts
- 4 Taxon Constraints
- 5 WormBase contributions to Gene Ontology content
- 6 Annotation Practices
- 7 Plans/Projects in progress
- 8 Model change
Manual Literature Curation
Reference Genome (see also Reference Genome Inferential Annotations)
This summarizes the annotations that may need to be revised due to changes in the GO's representation of transcription.
7 Molecular Function terms will be obsoleted. They are listed below with the number of manual elegans annotations associated:
- GO:0003704 specific RNA polymerase II transcription factor activity - 8 (all Kimberly)
ceh-24 - ISS - changed to GO:0000981 sequence-specific DNA binding RNA polymerase II transcription factor activity ceh-27 - ISS - same as above ceh-28 - ISS - same as above elt-1 - IDA - same as above elt-3 - IMP - for WBPaper00004593, removed MF term (no longer comfortable with IMP MF terms from this type of experiment); also made corresponding BP term less granular, from positive regulation to just gene-specific transcription from pol II promoter elt-3 - IMP - same as for elt-3 above hlh-3 - IMP - for WBPaper00031977, removed MF term for same reason as above, also made BP term less granular as above zip-2 - IMP - for WBPaper00035891, same as above for elt-3 and hlh-3
Semi-Automated Methods of Curation
Textpresso-Based Curation
GO Cellular Component Curation - MOD-Specific Pages
General specifications
dictyBase
FlyBase
TAIR (this is the older page no longer used)
TAIR_CCC
WormBase
GO Cellular Component Curation - General Issues
Processing Gene and Protein Names for Searches and Curation
Specifications for CCC Curation from Textpresso Search Page
- MFC - GO Molecular Function Curation using Textpresso
mf_hmm tool
in vitro flagging
Phenotype2GO pipeline (Sanger and Caltech)
- The old Sanger script that generates the gene_association file (from Igor's work in January 2009) was changed. Instead of an exclusion list and 'include list' that comprises papers (mostly large scale genome-wide studies) is provided to the script. This list is curator approved and explicitly agreed upon for the propagation of GO terms to genes based on their RNAi phenotypes.
- A new script is used, to use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
- If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.
- The old script: inherit_GO_terms.pl does not consult any exclusion/inclusion files. To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.
- Current status:From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled. The script should be run without the '-variation' option, but the gene_association file still has those. Try this:
grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi This is now resolved.
InterPro2GO Mappings for IEA Annotations
Reference Genome Inferential Annotations
Software Developement: Tools and Scripts
Reference Genome Reports - Annotation Coverage
Ontology Annotator - The GO annotation interface
Taxon Constraints
From Chris Mungall, 8/19/2011:
The taxon checks are run weekly, and the reports deposited here:
http://www.geneontology.org/quality_control/annotation_checks/taxon_checks/
Note that this service will be subsumed into a more comprehensive annotation QC service (apologies if you weren't at the USC meeting, where this was demoed). This is, in general, the plan for many of the ad-hoc scripts and cron reports we perform now. I will send an email to the GOC list next week describing the roll-out process for this.
For the QC checks, the idea is to push the checking as far upstream as possible. A weekly report is too reactive. This could be done at the time of submission. Even better, the annotation tool could use the central web service at the time of annotation.
WormBase contributions to Gene Ontology content
2012
- regulation, positive, negative of oocyte maturation
- incorrect InterPro2GO mapping for IPR003131
- dishabituation
- double-stranded DNA-dependent ATPase activity
- new representation of tail tip morphogenesis
- synonym for nuclear inner membrane
- change definition of apical junction complex
- regulation, positive, negative of serine-type endopeptidase activity
- regulation, positive, negative of neuromuscular synaptic transmission
2011
- protein binding - SUMO conjugating enzyme
- regulation of neuron migration
- basement membrane assembly involved in embryonic body morphogenesis
- parentage of dauer larval development - also include dormancy process
- regulation of ATP biosynthetic process
- regulation, positive, negative of dipeptide transport
- regulation of phospholipid transport
- regulation, positive, negative of endocytic recyling
- suggested change to InterPro2GO mapping for GoLoco motif
- GABAergic neuron differentiation
- pre-mRNA binding
- nitric oxide sensory activity
- age-dependent behavioral decline
- regulation, positive, negative of anterograde axon cargo transport and retrograde axon cargo transport
- aggrephagy
- germ cell proliferation
- in progress - centrosome maturation - when, what, how
- defecation motor program
- modifications to terms and definitions of cilium assembly and sensory cilium assembly
- ciliary transition zone
- regulation and pos/neg regulation of microtubule motor activity
- nickel ion homeostasis and cellular nickel ion homeostasis
- neurotransmitter receptor catabolic process
2010
- regulation of defecation, positive and negative children (2010)
- mitochondrial prohibitin complex (2010)
- cilium terms (2010, updates/revisions to terms added in 2005)
- octapamine/tyramine signaling involved in the response to food (and the regulation terms) (2010)
- alpha-tubulin acetylation (2010)
- phagosome maturation involved in apoptotic cell clearance (2010)
- phagosome acidification involved in apoptotic cell clearance(2010)
- phagolysosome assembly involved in apoptotic cell clearance (2010)
- phagosome-lysosome docking involved in apoptotic cell clearance (2010
- phagosome-lysosome fusion involved in apoptotic cell clearance (2010)
- neuropeptide receptor binding (2010)
- striated muscle contraction involved in embryonic body morphogenesis (2010)
- striated muscle myosin thick filament assembly (2010)
- striated muscle paramyosin thick filament assembly (2010)
- determination of left/right asymmetry in the nervous system (2010)
- regulation of locomotion (including positive and negative regulation child terms) involved in locomotory behavior (2010)
- detoxification of arsenic (2010)
- chondroitin sulfate proteoglycan binding (2010)
- chondroitin sulfate binding (2010)
- regulation (includes positive and negative regulation child terms) of nematode larval development (2010)
- regulation of (includes positive and negative regulation terms) dauer larval development (2010)
2009
- response to drug withdrawel (2009)
- phosphatidylserine exposure on apoptotic cell surface (2009)
2008
- regulation of synaptic vesicle priming (2008)
- chloride-activated potassium channel activity (2008)
- transdifferentiation (2008)
- Regulation of ovulation terms (2008)
- Process terms for gap junction proteins (2008)
- piRNA and 21U-RNA terms (2008)
2007
- dense body (sensu Nematoda) cellular component term (2007)
- GO:0000775, GO:0000779, GO:0000780
- D/V and A/P axon guidance terms (2007)
- palmitoyl-CoA 9-desaturase activity (2007)
- response to hyperoxia (2007)
- Cuticle component terms (2007)
- response to anoxia (2007)
2006
- dynein light intermediate chain binding (2006)
- Regulation terms for cell and nuclear division (2006)
- Several child terms for apoptosis (2006)
2005
- Cilium terms (2005)
2004
- Intraflagellar transport particle-component terms (2004)
- oogenesis (non-species specific term)(2004)
Modifications to the Ontology
- Revised definition for muscle homeostasis (2010)
- Added dense core vesicle synonym to dense core granule (2010)
- Updated definition and moved parentage for intraflagellar transport (2009)
- Added lethargus as synonym for sleep (2008)
- Change to the definitions of the component terms: GO:0000775, GO:0000779, GO:0000780 which refer to the centromeres or chromosome, pericentric region (2007)
- Change to parent of tail tip morphogenesis (sensu Nematoda) (2006)
- GO:0046536, dosage compensation complex definition (2006)
Annotation Practices
Cellular Component Annotations
If a protein contains a transmembrane domain, but expression experiments are not at sufficient resolution to show membrane localization, what annotation should we make?
Example: WBPaper00036024
WormBase use of Column 16
Column 16 refers to a column in the Gene Ontology's (GO) tab-delimited gene association file (gaf) that WormBase submits to the GO consortium on a regular basis.
Column 16 has been referred to as the Annotation Extension column in that it provides a placeholder for curation details that cannot be captured by a GO term alone, for example the substrate upon which an enzyme acts.
A number of different types of information could conceivably be entered into Column 16. The list below begins to document the potential use of Column 16 by WormBase curators with any additional information or questions that have arisen during the course of curation.
In the GAF, there will be an explicit relationship between the entity in Column 16 and the GO term. The annotation extension relations are viewable here:
http://www.geneontology.org/scratch/xps/go_annotation_extension_relations.obo
Column 16 curation at WormBase is just beginning and will likely be fleshed out more fully over the next few months.
In the Ontology Annotator, Column 16 data is being entered into the 'Xref to' field in the following format: Column 16: Xref ID
Biological Process Examples:
Translational Regulation
Example 1: sup-26 is annotated to GO:0017148, negative regulation of translation. The entry in Column 16 is the target of that regulation, tra-2.
In OA entry: Column 16: WB:WBGene00006605
[Typedef]
id: has_regulation_target
name: has_regulation_target
def: "Identifies a gene or gene product affected by a regulation BP or regulator MF." [GOC:mah]
comment: probably want to add one or two new subtypes that capture something about directness
domain: GO:0065007 ! biological regulation
range: TEMP:0000003 ! gene or gene product
is_a: OBO_REL:has_participant
mRNA Processing
Example: sup-12 mutations affect splicing of unc-60 transcripts.
Reference: WBPaper00024604
Defense Response
Example 1: lys-7 is required for defense response to Cryptococcus neoformans
In OA entry: Column 16: NCBI:192011 (a taxon ID)
Response to Terms
Example 1: daf-2 is shown to be involved in response to oxidative stress by treating animals with paraquat. WBPaper00005488
In OA entry: annotate to 'response to oxidative stress' using CHEBI:34905
Cell Fate Specification
Example 1: egl-38 is found to be required for cell fate specification in the male tail. WBPaper00002924
Could add a number of Anatomy Terms to Column 16 (not done yet).
Regulation of Protein Localization
Example 1: hmp-1 and jac-1;hmp-1 double mutants are shown to affect the distribution of HMR-1. WBPaper00005972
Added WBGene ID of HMR-1 to Column 16.
Molecular Function Examples:
Nucleic Acid Binding
Example 1: sup-26 is annotated to GO:0003730, mRNA 3'-UTR binding. The entry in Column 16 is the target of that binding, tra-2.
In OA entry: Column 16: WB:WBGene00006605
Plans/Projects in progress
Changes to the GO data model
- Add tags for accommodating data in WormBase that are already in the gene association file:
- Qualifying an annotation with the qualifiers 'NOT' 'contributes_to' or 'colocalizes with'
- Using the generic GO_REF tags for generic references eg., for a NOT annotation, need to add the proper database and accession syntax (need to add a field in curation interface in OA).
- 'With' or 'From', for the use of additional identifiers with the use of certain evidence codes like IPI, IGI, etc.
- Annotation Extension, for containing cross references to other ontologies,one of:
- DB:gene_id
- DB:sequence_id
- CHEBI:CHEBI_id
- Cell Type Ontology:CL_id
- GO:GO_id
- Gene Product Form ID, a canonical entry for specific variants of gene products.
- When the gene product form ID (column 17 of ga) is filled with a protein identifier, the value in DB object type (column 12 of ga) must be protein. Protein identifiers can include UniProtKB accession numbers, NCBI NP identifiers or Protein Ontology (PRO) identifiers.
- When the gene product form ID (column 17 of ga) is filled with a functional RNA identifier, the DB object type (column 12 of ga) must be either ncRNA, rRNA, tRNA, snRNA, or snoRNA.
Model change
in ?Gene
GO_annotation ?GO_term XREF Gene ?GO_code #GO_annotation_info
#GO_annotation info Annotation_extension Text ?Gene #This corresponds to Column 16 in GAF2.0 Text will be populated with #a value from the OBO relations file. #This tag will likely be expanded in the future, as #more relations are introduced. Text ?Molecule Text ?Anatomy_term Gene_product_form ?Protein #This corresponds to Column 17 in GAF2.0 and will be used to indicate that #an annotation applies only to a specific protein isoform or transcript. ?Transcript Qualifier Not Contributes_to Colocalizes_with Annotation_with_from Text #This correspond to Column 8 in GAF2.0; used with specific evidence codes only. GO_ref Text #GO_REF ID used for ND, some ISS, PAINT annotations, etc. Reference ?Paper #Reference used for literature-based annotations. Database ?Database ?Database_field ?Accession_number Text #To source annotations from outside groups like UniProt.
Changes to the GO_term model and updating the ontology in WormBase
Progress Report 2011
Back to Caltech documentation