Difference between revisions of "Gene Ontology"

From WormBaseWiki
Jump to navigationJump to search
m
 
(179 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
==File Uploads==
 +
===WormBase Uploads===
 +
====UniProtKB gpad file to .ace file====
 +
This page documents how to go from the UniProtKB gpad file produced weekly to the WormBase .ace file for upload: [[UniProtKB gpad to WormBase .ace]]
 +
====Phenotype2GO uploads to postgres====
 +
This page documents how to add new Phenotype2GO-based annotations to postgres: [[Adding new Phenotype2GO annotations to postgres]]
 +
====[[Updating_go.ace_file]]====
 +
====[[SOP_for_generating_GO_files_for_citace_and_GO_consortium_uploads]]====
 +
====[[Specifications_for_WB_gpi_file]]====
 +
====[[GO-CAM GPAD]]====
 +
 +
===GOC Uploads===
 +
====[[Noctua - Upload of WB Manual Annotations]]====
 +
 
==Manual Literature Curation==
 
==Manual Literature Curation==
 +
=== Noctua Models ===
 +
*[http://wiki.wormbase.org/index.php/Noctua_model_curation_tracking_table Noctua model curation tracking table]
 +
 +
===Gastruation and Morphogenesis Modeling [[PMID:26412237]]===
 
===Reference Genome (see also [[Reference Genome Inferential Annotations]])===
 
===Reference Genome (see also [[Reference Genome Inferential Annotations]])===
 
#[[Lung Development Targets (November 2009 - February 2010)]]
 
#[[Lung Development Targets (November 2009 - February 2010)]]
Line 23: Line 41:
 
#[[GPAD to .ace file]]
 
#[[GPAD to .ace file]]
 
#[[GPAD to .go file]]
 
#[[GPAD to .go file]]
#[[Specifications_for_WB_gpi_file]]
 
 
#[[What's staying in postgres?]]
 
#[[What's staying in postgres?]]
  
Line 46: Line 63:
 
=====[[in vitro flagging]]=====
 
=====[[in vitro flagging]]=====
  
====Phenotype2GO pipeline (Sanger and Caltech)====
+
==== Old Phenotype2GO pipeline (Sanger and Caltech)====
 
*The old Sanger script that generates the gene_association file (from Igor's work in January 2009) was changed. Instead of an exclusion list and 'include list' that comprises papers (mostly large scale genome-wide studies) is provided to the script. This list is curator approved and explicitly agreed upon for the propagation of GO terms to genes based on their RNAi phenotypes.
 
*The old Sanger script that generates the gene_association file (from Igor's work in January 2009) was changed. Instead of an exclusion list and 'include list' that comprises papers (mostly large scale genome-wide studies) is provided to the script. This list is curator approved and explicitly agreed upon for the propagation of GO terms to genes based on their RNAi phenotypes.
 
*A new script is used, to use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
 
*A new script is used, to use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
Line 53: Line 70:
 
*Current status:From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled. The script should be run without the '-variation' option, but the gene_association file still has those. Try this:
 
*Current status:From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled. The script should be run without the '-variation' option, but the gene_association file still has those. Try this:
 
grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi This is now resolved.
 
grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi This is now resolved.
 +
 +
*[[November 2014 Phenotype2GO Mappings File]]
  
 
*[[Phenotype2GO Mappings File]]
 
*[[Phenotype2GO Mappings File]]
Line 58: Line 77:
 
*[[Phenotype2GO Paper Inclusion List]]
 
*[[Phenotype2GO Paper Inclusion List]]
 
*[[Phenotype2GO Analysis]]
 
*[[Phenotype2GO Analysis]]
 +
*[[20141009 - Phenotype2GO Mappings Updates]]
 +
*[[20141022 - Phenotype2GO Pipeline]]
 +
  
 
=====InterPro2GO Mappings for IEA Annotations=====
 
=====InterPro2GO Mappings for IEA Annotations=====
Line 64: Line 86:
  
 
==Software Development: Tools and Scripts==
 
==Software Development: Tools and Scripts==
====[[Reference Genome Reports - Annotation Coverage]]====
+
=== [[Gene - GO Curation Status]] ===
====[[Ontology Annotator - The GO annotation interface]]====
+
===[[Reference Genome Reports - Annotation Coverage]]===
====[[Textpresso related forms]]====
+
===[[Ontology Annotator - The GO annotation interface]]===
====[[WormBase gene association file]]====
+
===[[Textpresso related forms]]===
====[[Updating go.ace file]]====
+
===[[WormBase gene association file]]===
 +
 
  
 
==Taxon Constraints==
 
==Taxon Constraints==
Line 299: Line 322:
  
 
==Plans/Projects in progress==
 
==Plans/Projects in progress==
====?GO_annotation_info model====
+
====?GO_annotation model====
 
*The GO annotation model is becomingly increasingly complex, capturing more annotation detail.
 
*The GO annotation model is becomingly increasingly complex, capturing more annotation detail.
*The proposed model below would create a #GO_annotation_info hash that includes tags for the following GO fields:
+
*The proposed model below would create a ?GO_annotation class that includes tags for the following GO fields:  
**'With' or 'From', for populating identifiers used with certain evidence codes that indicate multi-entity results like IPI (Inferred from Physical Interaction), IGI (Inferred from Genetic Interaction), etc.
+
**GO_code: the evidence code used for making the GO inference
**Annotation Extension for capturing cross references to other ontologies or entities.  This information will be used to help construct the LEGO models of pathways and processes (see http://wiki.geneontology.org/index.php/LEGO_Model_Draft_Specification).
+
**Gene_rel: captures the explicit relation between a GO term and the gene product annotated, including 'NOT' relations
**Qualifier for qualifying an annotation with the GO qualifiers 'not' 'contributes_to' or 'colocalizes with'
+
**Annotation_made_with: entities captured in the GAF With/From column for populating identifiers used with certain GO codes such as IPI (Inferred from Physical Interaction), IGI (Inferred from Genetic Interaction), IC (Inferred from Curator), etc.
**Isoform for cases where an experiment describes the activity of a specific protein isoform; this would be captured in the Annotated_isoform tag with a UniProtKB accession (or WP: ID?).
+
**Annotation _extension: for capturing annotation context using additional ontologies or entities.  Entries in the annotation extension column could be used to help construct the LEGO models of pathways and processes (see http://wiki.geneontology.org/index.php/LEGO_Model_Draft_Specification).
**Interacting taxon for dual-taxon annotations that are made, for example, from host-pathogen interactions.
+
**Annotation_isoform: used in cases where an experiment describes the activity of a specific protein isoform; currently this information is captured with a UniProtKB accession; we would like to convert these to WB WP IDs.  See tra-1 annotations for an example.
**Gene product properties allows for remarks about the completeness of annotation, inclusion of a gene and its annotations as part of special GO projects, etc.
+
**Interacting_strain and Interacting_species: for capturing the strain or species when dual-taxon GO annotations are made, for example, to annotate gene products involved in host-pathogen interactions.
*Also:
+
**Reference: if a published paper is used as evidence.
**Uses Accession_evidence to capture GO_REF IDs for annotations that do not use a published paper, but rather a documented GO curation practice (used for ND annotations, for example  see [http://www.geneontology.org/cgi-bin/references.cgi  GO References]
+
**GO_reference: to capture GO_REF IDs for annotations that do not cite a published paper, but rather a documented GO curation practice (used for ND annotations, for example  see [http://www.geneontology.org/cgi-bin/references.cgi  GO References]
**Introduces a new type of #Evidence, Assigned_by, to allow us to incorporate GO annotations from other groups such as UniProt and IntAct.
+
**Contributed_by: Uses ?Analysis objects to incorporate GO annotations from other annotation groups such as the PAINT curation efforts, UniProt, IntAct, etc. with proper attribution for the annotation source.
**Replaces the text of GO_codes with Evidence Code Ontology IDs.  This means we'll also have to include the Evidence Code Ontology in WB.
 
  
 
*[[Curation Scenarios for new GO Model - GPAD and .ace Representations]]
 
*[[Curation Scenarios for new GO Model - GPAD and .ace Representations]]
  
Proposed new ?GO_annotation_info hash:
+
*[[New Model Implementation Checklist]]
  
    ?Gene GO_term ?GO_term XREF Gene #GO_annotation_info 
+
=====Proposed new ?GO_annotation class for WS247=====
                                                                                                     
 
   
 
  
    ?GO_annotation_info Relation ?Text #Evidence  //This describes the relationship between the entity annotated and the GO term.
+
  ?GO_annotation Gene ?Gene XREF GO_annotation
                          GO_code ?ECO_term #Evidence
+
                  GO_term ?GO_term XREF GO_annotation
                          Annotation_made_with     ?Gene #Evidence
+
                GO_code ?GO_code
                                                  ?Motif #Evidence
+
                Annotation_relation NOT
                                                  ?RNAi       #Evidence
+
                                    colocalizes_with
                                                  ?Variation #Evidence
+
                                    contributes_to
                                                  ?Phenotype #Evidence
+
                                    enables
                                                  ?Text       #Evidence  //This could be ID from another MOD, for example.
+
                                    involved_in
                          Annotation_extension Relation ?Text ?Life_stage                 #Evidence
+
                                    part_of                               
                                              Relation ?Text ?Gene                         #Evidence
+
                Annotation_made_with Interacting_gene ?Gene                         //for IGI and IPI annotations
                                              Relation ?Text ?Database ?Database_field Text #Evidence
+
                                      Inferred_from_GO_term ?GO_term
                                              Relation ?Text ?Anatomy_term #Evidence
+
                                      Motif ?Motif
                                              Relation ?Text ?GO_term                 #Evidence
+
                                      RNAi_result ?RNAi  
                          Annotation_qualifier NOT #Evidence
+
                                      Variation ?Variation
                                              colocalizes_with #Evidence
+
                                      Phenotype ?Phenotype
                                              contributes_to #Evidence
+
                                      Database ?Database ?Database_field ?Text     //for ISS, IEA, IGI, and PAINT annotations
                          Annotation_isoform  Database  ?Database ?Database_field Text #Evidence  //e.g., UniProtKB:Q9N5D6-1, would be nice to use WP:CE
+
                Annotation_extension Life_stage_relation ?Text UNIQUE ?Life_stage
                          Interacting_taxon ?NCBITaxonomyID #Evidence
+
                                      Gene_relation ?Text UNIQUE ?Gene  
                          Gene_product_properties Text #Evidence //CV for annotation completeness, inclusion in special GO curation projects, etc.
+
                                      Molecule_relation ?Text UNIQUE ?Molecule
 +
                                      Anatomy_relation ?Text UNIQUE ?Anatomy_term
 +
                                      GO_term_relation ?Text UNIQUE ?GO_term
 +
                Annotation_isoform   ?Text ?Protein         //captures information when annotation subject is an isoform
 +
                Interacting_species  ?Species ?Strain      //dual-taxon annotations; always populate species; strain when possible
 +
                Reference ?Paper XREF GO_annotation
 +
                GO_reference ?Database ?Database_field ?Text  //for GO's internal reference IDs, e.g., GO_REF:0000014
 +
                Contributed_by ?Analysis                      //to properly credit annotations from other sources
 +
                Date_last_updated UNIQUE DateType
  
 +
===Future Plans===
  
Additional #Evidence tag
+
Since we are now sharing a GO curation tool with UniProt, in the annotation file we get back from them, we will have isoforms listed as Q9N5D6-2 and would like to be able to map them back to a specific WP: identifier.
  
Assigned_by Text
 
  
====?GO_term model====
+
====?GO_term model (implemented 2013-06)====
  
Proposal #1 for new ?GO_term model:
+
     ?GO_term Name ?Text
 
+
               Definition ?Text
     ?GO_term Name UNIQUE ?Text
+
               Term ?Text
               Status UNIQUE Valid
 
                            Obsolete
 
              Namespace UNIQUE Biological_process
 
                              Cellular_component
 
                              Molecular_function
 
              Alternate_id ?Text
 
               Definition UNIQUE ?Text
 
              Comment Text //Explains obsoletions, clarifies usage of some terms.
 
 
               Synonym Broad ?Text
 
               Synonym Broad ?Text
 
                       Exact ?Text
 
                       Exact ?Text
 
                       Narrow ?Text
 
                       Narrow ?Text
 
                       Related ?Text
 
                       Related ?Text
              Parent Is_a          ?GO_term XREF Is    //Except for the root terms, all terms should have this
 
                    Part_of      ?GO_term
 
              Child  Is            ?GO_term XREF Is_a
 
              Part_of              ?GO_term
 
              Regulates            ?GO_term
 
              Negatively_regulates ?GO_term
 
              Positively_regulates ?GO_term
 
              Has_part            ?GO_term
 
              Starts_during        ?GO_term
 
              Happens_during      ?GO_term
 
              Ends_during          ?GO_term
 
              Occurs_in            ?GO_term
 
              Results_in          ?GO_term
 
              Intersection_of ?GO_term    //For specifying origin of cross-products.
 
                              ?Text        //This will be a relation ontology term and external ontology term.  Eventually articulate?
 
              Consider ?GO_term            //Gives a term which may be an appropriate substitute for an obsolete term.
 
              Replaced_by ?GO_term        //Gives a term which replaces an obsolete term.       
 
              Attribute_of ?Motif                XREF GO_term  //Needed for InterPro2GO mapping.
 
                          ?Gene                XREF GO_term  //Annotated entity for manual annotations.
 
                          ?CDS                  XREF GO_term  //Annotated entity for InterPro2GO and Phenotype2GO mappings.
 
                          ?Sequence            XREF GO_term  //Still needed?
 
                          ?Transcript          XREF GO_term  //Still needed?
 
                          ?Phenotype            XREF GO_term
 
                          ?Anatomy_term        XREF GO_term 
 
                          ?Homology_group      XREF GO_term  //Still needed?
 
                          ?Expr_pattern        XREF GO_term
 
                          ?Expression_cluster  XREF GO_term
 
                          ?Picture              XREF Cellular_component
 
                          ?WBProcess            XREF GO_term
 
              Index Ancestor  ?GO_term      //Consider transitivity.  Is this what's used for web display?
 
                    Descendent ?GO_term      //Consider transitivity.  Is this what's used for web display?
 
              Version UNIQUE Text            //SVN revision number
 
 
 
 
Proposal #2 for new ?GO_term model:
 
 
    ?GO_term Name UNIQUE ?Text
 
 
               Status UNIQUE Valid
 
               Status UNIQUE Valid
 
                             Obsolete
 
                             Obsolete
               Namespace UNIQUE Biological_process
+
               Type UNIQUE Biological_process
                              Cellular_component
+
                          Cellular_component
                              Molecular_function
+
                          Molecular_function
               Alternate_id ?Text
+
               Child Instance ?GO_term XREF Instance_of
              Definition UNIQUE ?Text
+
                    Component ?GO_term XREF Component_of
              Comment Text //Explains obsoletions, clarifies usage of some terms.
+
               Parent Instance_of ?GO_term XREF Instance
              Synonym Broad ?Text
+
                    Component_of ?GO_term XREF Component
                      Exact ?Text
+
               Attribute_of Cell ?Cell XREF GO_term  
                      Narrow ?Text
+
                          Motif ?Motif XREF GO_term  //Needed for InterPro2GO mapping.
                      Related ?Text
+
                           Gene ?Gene XREF GO_term  //Annotated entity for manual annotations.
              Parent Is_a          ?GO_term XREF Is    //Except for the root terms, all terms should have this
+
                           CDS ?CDS XREF GO_term  //Annotated entity for InterPro2GO and Phenotype2GO mappings.
              Child  Is            ?GO_term XREF Is_a
+
                           Sequence ?Sequence XREF GO_term  //Still needed?
               Relationship ?RO_term ?GO_term
+
                           Transcript ?Transcript XREF GO_term  //Still needed?
              Intersection_of ?GO_term     //For specifying origin of cross-products.
+
                           Phenotype ?Phenotype XREF GO_term
                              ?RO_term ?GO_term       
+
                           Anatomy_term ?Anatomy_term XREF GO_term   
               Consider ?GO_term           //Gives a term which may be an appropriate substitute for an obsolete term.
+
                           Homology_group ?Homology_group XREF GO_term  //Still needed?
              Replaced_by ?GO_term        //Gives a term which replaces an obsolete term.       
+
                           Expr_pattern ?Expr_pattern XREF GO_term
              Attribute_of ?Motif               XREF GO_term  //Needed for InterPro2GO mapping.
+
                           Picture ?Picture XREF Cellular_component
                           ?Gene                 XREF GO_term  //Annotated entity for manual annotations.
+
                           Index Ancestor  ?GO_term       
                           ?CDS                 XREF GO_term  //Annotated entity for InterPro2GO and Phenotype2GO mappings.
+
                                Descendent ?GO_term       
                           ?Sequence             XREF GO_term  //Still needed?
+
               Version UNIQUE Text //SVN revision number
                           ?Transcript           XREF GO_term  //Still needed?
 
                           ?Phenotype           XREF GO_term
 
                           ?Anatomy_term         XREF GO_term   
 
                           ?Homology_group       XREF GO_term  //Still needed?
 
                           ?Expr_pattern         XREF GO_term
 
                          ?Expression_cluster  XREF GO_term
 
                           ?Picture             XREF Cellular_component
 
                           ?WBProcess            XREF GO_term
 
              Index Ancestor  ?GO_term      //Consider transitivity.  Is this what's used for web display?
 
                    Descendent ?GO_term      //Consider transitivity.  Is this what's used for web display?
 
               Version UNIQUE Text           //SVN revision number
 
 
 
====Evidence Code Ontology Model====
 
  
Proposal for a new ?Evidence_code object:
 
 
    ?Evidence_code  Name ?Text
 
                    Status UNIQUE Valid
 
                                  Obsolete
 
                    Namespace ?Text
 
                    Alternate_id ?Text
 
                    Definition ?Text
 
                    Comment Text
 
                    Synonym ?Text Scope_modifier UNIQUE Broad
 
                                                        Exact
 
                                                        Narrow
 
                                                        Related
 
                    DB_info    Database  ?Database  ?Database_field  ?Accession_number  Text  ##PSI-MI, GO_REF, GOECO
 
                    Relationships is_a  ?Evidence_code      //Except for the root terms, all terms should have this
 
                    Intersection_of ?GO_term
 
                                    ?Text  //This will be a relation ontology term and external ontology term. Eventually articulate? 
 
                    Created_by Text
 
                    Creation_date Text
 
                    Version UNIQUE Text
 
  
 
====Expanded IEP Evidence Code====
 
====Expanded IEP Evidence Code====
Line 480: Line 429:
  
 
==Papers that use C. elegans GO Annotations==
 
==Papers that use C. elegans GO Annotations==
 +
[[Papers that use GO - 2015]]
  
 
*'''WBPaper00035429'''
 
*'''WBPaper00035429'''
Line 518: Line 468:
 
**40. Huang da W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols 4: 44–57.
 
**40. Huang da W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols 4: 44–57.
  
==GO pipeline and scripts==
+
*'''WBPaper00056066'''
====SOP for generating a gene association file====
+
**We then used the DAVID functional annotation program to identify biologic themes within the common up‐ and down‐regulated genes.
 
 
In the acedb user account on Tazendra at:/home/acedb/ranjana/GO: <br/ >
 
--Use ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS211/ONTOLOGY/gene_association.WS211.wb.ce <br/ >
 
--use'grep IEA gene_association.WSXXX.wb.ce>gene_association.wb.electronic to separate the IEAs. <br/ >
 
--grep WBPhenotype gene_association.WSXXX.wb.ce > gene_association.wb.rnai2go(to get i.e both Erich's earlier RNAi2GO ones and the new associations based on allele phenotypes that went into WormBase WS186). <br/ >
 
--copy the right go.go.<date> file from /home/acedb/ranjana/citace_upload/go_curation/go_dumper_files/ to this directory,change name to gene_association.wb.manual. <br/ >
 
--new GOA elegans file, from 04.02.12, for external annots (use 'wget ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/9.C_elegans.goa') <br/ >
 
--Run the ./wrapper.pl script <br/ >
 
Output will include the various error types <br/ >
 
--Run ./strip_errors_and_concatenate.pl <br/ >
 
 
 
Scp the generated gene association file to a local machine for post-processing and upload to the GOC
 
In the tmp directory on Maya:
 
--scp file to Maya <br/ >
 
--removed 'NOT' annotations from mtm-9, vha-2, vha-3, hsp-60, hsp-12.3, hsp-12.6. (We do not take out NOT annotations anymore) <br/ >
 
--removed header from the middle of concatenated file in two places (on top of UniProt file too, search for 'gaf-version') and placed on top of file (correct minor mistake in header--space after the $ on one of the lines) <br/ >
 
--And move the following header from the middle of file to the top of file: <br/ >
 
!Version: $Revision: $ <br/ >
 
!Organism: Caenorhabditis elegans <br/ >
 
!date:      $Date: $ <br/ >
 
!From: WormBase <br/ >
 
--Add these two lines at the bottom of header: <br/ >
 
!DataBase_Project_Name: WormBase WS215/WS216 <br/ >
 
!gaf-version: 2.0 <br/ >
 
--Remove the header 'gaf 2.0', from the top of the UniProt file <br/ >
 
--gzip file <br/ >
 
--Copy file to the tmp directory <br/ >
 
 
 
Use SVN commands to upload to the GO, also update README file every upload.
 
 
 
====SOP for generating a GO dump ace file====
 
  
On Tazendra, acedb account: <br/ >
+
==GO Uploads==
--run the ./wrapper.pl script at /home/acedb/ranjana/citace_upload/go_curation/ <br/ >
+
[[SOP for generating GO files for citace and GO consortium uploads]]
--./wrapper.pl dumps both go.ace and go.go files under /home/acedb/ranjana/citace_upload/go_curation/go_dumper_files/ with dates appended <br/ >
 
-- go.go.20090731 and go.ace.20090731.091726 files created under /go_dumper_files <br/ >
 
--Run the check_go_ace.pl script as './check_go_ace.pl filename'
 
./check_go_ace.pl  (NOTE: THIS SCRIPT NO LONGER RUN) <br/ >
 
then strips out errors that don't have to do with the Gene header, and puts all errors in the error_files/go.err.time (if it's in the go.ace.time format it replaces the ace part with err) <br/ >
 
--As of now the script is removing only the erroneous line but not the curator_confirmed line associated and directly under this line, which needs to be removed manually.  Need to think about this. <br/ >
 
--Run the count_stuff_for_ace.pl on the script to get the numbers
 
Note***Worked with JC to modify check_go_ace.pl, actually this script is no longer relevant and could be skipped, since we are using the OA. <br/ >
 
 
 
--scp file to maya.caltech.edu and rename file in format: <br/ >032107_WS174_go_dump.ace <br/ >
 
 
 
--Manually remove these annotations that are actually 'NOT'annotations of: <br/ >
 
mtm-9 WBGene00003479 GO:0004438 <br/ >
 
vha-2 WBGene00006911 GO:0009790--looks like annotation was removed manually, no longer in dump <br/ >
 
vha-3 WBGene00006912  GO:0009790--looks like annotation was removed manually, no longer in dump <br/ >
 
hsp-60 WBGene00002025 GO:0009408 (added from WS194 upload) <br/ >
 
hsp-12.3 WBGene00002012 GO:0051082 (added from WS202 upload) <br/ >
 
hsp-12.6 WBGene00002013 GO:0051082 and GO:0006950 <br/ >
 
 
 
--Test file syntax and #of objects in local citace mirror on Juno: <br/ >
 
Read in file for syntax errors <br/ >
 
Count #of WBGenes, Papers, WBPersons before and after loading ace file <br/ >
 
--scp file to citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Ranjana/. <br/ >
 
 
 
The following files are submitted to the citace account on citace@spica.caltech.edu every build: <br/ >
 
To: /home/citace/Data_for_citace/Data_from_Ranjana/ <br/ >
 
1.  date_WSXXX_go_dump.ace  (dumped from postgres, from the manual curation via Phenote) <br/ >
 
2.  variation2goterm_VarID.ace. This is the file where allele names have been converted to WBVarIDs by Wen. Use this file until this data is read into Postgres. <br/ >
 
3. phenotype2go_mappings.ace  (consolidated phenotype2go mappings for any given build). <br/ >
 
4. The WSXXXGOterms.ace file that Wen dumped (change name from WS208GO.ace)
 
TO: /home/citace/Data_for_Ontology/ at citace@spica.caltech.edu <br/ >
 
 
 
NOTE:These genes were added to the paper editor, so this file is no longer manually being put into citace. <br/ >
 
5. WBPaper00038491_genes.ace added genes to paper connection for Daniel Shaye <br/ >
 
 
 
Change directory to: Data_for_Ontology/, under /home/citace/. <br/ >
 
Here use 'wget' to get gene_ontology_edit.obo file from <br/ > http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.obo. <br/ >
 
Rename file in the format: gene_ontology.WS231.obo. <br/ >
 
  
 
==Problems with GO data in WormBase (as of June 2013)==
 
==Problems with GO data in WormBase (as of June 2013)==
Line 597: Line 478:
  
  
== Models proposal feedback Sept 2013 ==
+
==Jenkins Validation Checks==
 
 
'''Kimberly:''' Thanks for the feedback, Paul.  Right now, we are only working on the #GO_annotation_info model, so you can ignore the GO_term models that are on the wiki.  I've made comments below. 
 
 
 
* ?RO_term and ?ECO_term - not sure what these are, both used in the models on the wiki, but no model is in existence/proposed for them? (I think ECO_term = Evidence_code, not sure for RO_term)
 
 
 
'''Kimberly:''' The ?RO_term tag is meant to include a Relations Ontology term, however, at the moment this ontology doesn't have all of the relations that is used for GO curation, so note that in the #GO_annotation_info model we don't use it and instead use Text for the relations. 
 
 
 
I'd like to use Evidence Code Ontology terms wherever we can for GO curation.  There is a proposed model for that here, but I need to go over that model again:
 
 
 
 
 
'''?Evidence_code model''' - are you guys thinking of consolidation ?GO_code and ?AO_code into this class
 
 
 
That way you can still use the GO_code ?Evidence_code which is what I think you were proposing?
 
?GO_code is used in lots of classes, so it wouldn't be good just to drop it/update it in one place and not all the rest....retiring the 2 models mentioned.
 
 
 
'''Kimberly:'''I think it'd be good to consolidate ?GO_code and ?AO_code if Raymond is okay with that.  Also, yes we would then replace ?GO_code with ?Evidence_code in each of the respective models.  This might also be a good time to review what objects have GO_term tags and see if we still need those tags.
 
 
 
* There were issues with rooted tags so have made changes to the model to get it to load....see below.
 
 
 
#Evidence hash addition
 
 
 
Assigned_by Text
 
 
 
* Seems a little redundant, what is this to store as we have "Curator_confirmed", "Person_evidence  and "Author_evidence" - Consortia have previously been added as Authors for this purpose?
 
 
 
'''Kimberly:''' The spirit of Assigned_by is closest to "Curator_confirmed" so perhaps we could just go with "Curator_confirmed" and ask Cecilia to make Person objects for the various other groups from which we will be incorporating GO annotation information?
 
 
 
 
 
I wasn't able to load the model into acedb as it was so had to tinker quite a bit. This loads but might not be optimal:
 
 
 
<pre>
 
?GO_term Name UNIQUE ?Text
 
        Status UNIQUE Valid
 
                      Obsolete
 
        Namespace UNIQUE Biological_process
 
                          Cellular_component
 
                          Molecular_function
 
        Alternate_id ?Text
 
        Definition UNIQUE ?Text
 
        Comment Text //Explains obsoletions, clarifies usage of some terms.
 
        Synonym Broad ?Text
 
                Exact ?Text
 
                Narrow ?Text
 
                Related ?Text
 
        Parent Is_a          ?GO_term XREF Is    //Except for the root terms, all terms should have this
 
        Child  Is            ?GO_term XREF Is_a
 
        //Relationship ?RO_term ?GO_term
 
        Intersection_of ?GO_term    //For specifying origin of cross-products.
 
        //                ?RO_term ?GO_term       
 
        Consider ?GO_term            //Gives a term which may be an appropriate substitute for an obsolete term.
 
        Replaced_by ?GO_term        //Gives a term which replaces an obsolete term.       
 
        Attribute_of Motif ?Motif                XREF GO_term  //Needed for InterPro2GO mapping.
 
                      Gene ?Gene                XREF GO_term  //Annotated entity for manual annotations.
 
                      CDS ?CDS                  XREF GO_term  //Annotated entity for InterPro2GO and Phenotype2GO mappings.
 
                      Sequence ?Sequence            XREF GO_term  //Still needed?
 
                      Transcript ?Transcript          XREF GO_term  //Still needed?
 
                      Phenotype ?Phenotype            XREF GO_term
 
                      Anatomy_term ?Anatomy_term        XREF GO_term 
 
                      Homology_group ?Homology_group      XREF GO_term  //Still needed?
 
                      Expr_pattern ?Expr_pattern        XREF GO_term
 
                      Expression_cluster ?Expression_cluster  XREF GO_term
 
                      Picture ?Picture              XREF Cellular_component
 
                      WBProcess ?WBProcess            XREF GO_term
 
        Index Ancestor  ?GO_term      //Consider transitivity.  Is this what's used for web display?
 
              Descendent ?GO_term      //Consider transitivity.  Is this what's used for web display?
 
        Version UNIQUE Text            //SVN revision number
 
 
 
 
 
#GO_annotation_info GO_Relation ?Text #Evidence  //This describes the relationship between the entity annotated and the GO term.
 
                    GO_code ?Evidence_code #Evidence
 
    Annotation_made_with Gene ?Gene #Evidence
 
Motif ?Motif #Evidence
 
RNAi ?RNAi        #Evidence
 
Variation ?Variation #Evidence
 
Phenotype ?Phenotype #Evidence
 
Remark ?Text        #Evidence  //This could be ID from another MOD, for example.
 
            Annotation_extension Life_stage_Relation ?Text ?Life_stage                 #Evidence
 
Gene_Relation ?Text ?Gene                         #Evidence
 
Database_Relation ?Text ?Database ?Database_field Text #Evidence
 
Generic_Relation ?Text ?Anatomy_term #Evidence
 
GO_term_Relation ?Text ?GO_term                 #Evidence
 
            Annotation_qualifier NOT #Evidence
 
colocalizes_with #Evidence
 
contributes_to #Evidence
 
            Annotation_isoform  Database  ?Database  ?Database_field  Text #Evidence  //e.g., UniProtKB:Q9N5D6-1, WP:CE
 
    Interacting_taxon ?NCBITaxonomyID #Evidence
 
            Gene_product_properties Text #Evidence //CV for annotation completeness, inclusion in special GO curation projects, etc.
 
 
 
 
 
?Evidence_code  Name ?Text
 
                Status UNIQUE Valid
 
                              Obsolete
 
                Namespace ?Text
 
                Alternate_id ?Text
 
                Definition ?Text
 
                Comment Text
 
                Synonym Scope_modifier Broad ?Text
 
                                      Exact ?Text
 
                                      Narrow ?Text
 
                                      Related ?Text
 
                DB_info    Database  ?Database  ?Database_field  ?Accession_number  Text  //PSI-MI, GO_REF, GOECO
 
                Relationships is_a  ?Evidence_code      //Except for the root terms, all terms should have this
 
                Intersection_of GO_term ?GO_term
 
                                Remark ?Text  //This will be a relation ontology term and external ontology term. Eventually articulate? 
 
                Created_by Text
 
                Creation_date Text
 
                Version UNIQUE Text
 
</pre>
 
 
 
I also took the sample data from the "NEW .ACE FILE" it proved problematic to get any of it to load, principally as the proposed model is so untested and there were erroneous data points in there :(
 
 
 
 
 
 
 
Initial feedback continued:
 
 
 
'''GO_term model'''
 
 
 
1) " Isoform for cases where an experiment describes the activity of a specific protein isoform; this would be captured in the Annotated_isoform tag with a UniProtKB accession (or WP: ID?). " I would think that WormBase primary identifiers should be used as we want to promote our data 1st ;)
 
 
 
'''Kimberly:'''Yes, would be great to use WP: identifiers wherever we can.  Do we currently map specific UniProt isoforms (e.g., Q9N5D6-2) to specific WP: protein entries?  I looked but couldn't find this in WP entries.
 
 
 
See, for example, the Alternative Products on this UniProt entry page:  http://www.uniprot.org/uniprot/Q9N5D6
 
 
 
Since we are now sharing a GO curation tool with UniProt, in the annotation file we get back from them, we will have isoforms listed as Q9N5D6-2.
 
 
 
 
 
 
 
2) I think I queried if this duplication is necessary once before(? might have been a different class) but if they are to store the parent child relationships then this might be a good time to clarify what they are used for and remove if appropriate.
 
<pre>
 
Parent Is_a          ?GO_term XREF Is    //Except for the root terms, all terms should have this
 
            Child  Is            ?GO_term XREF Is_a
 
 
 
 
 
Index Ancestor  ?GO_term      //Consider transitivity.  Is this what's used for web display?
 
                  Descendent ?GO_term      //Consider transitivity.  Is this what's used for web display?
 
</pre>
 
The second tag structure has also lost the XREF Ancestor/Descendent
 
 
 
'''Kimberly:'''We can ignore this ?GO_term model for now.
 
 
 
 
 
3) Proposal #2 has a lot of un-rooted tags
 
 
 
<pre>
 
Attribute_of ?Motif                XREF GO_term  //Needed for InterPro2GO mapping.
 
                          ?Gene                XREF GO_term  //Annotated entity for manual annotations.
 
                          ?CDS                  XREF GO_term  //Annotated entity for InterPro2GO and Phenotype2GO mappings.
 
                          ?Sequence            XREF GO_term  //Still needed?
 
                          ?Transcript          XREF GO_term  //Still needed?
 
                          ?Phenotype            XREF GO_term
 
                          ?Anatomy_term        XREF GO_term 
 
                          ?Homology_group      XREF GO_term  //Still needed?
 
                          ?Expr_pattern        XREF GO_term
 
                          ?Expression_cluster  XREF GO_term
 
                          ?Picture              XREF Cellular_component
 
                          ?WBProcess            XREF GO_term
 
</pre>
 
 
 
A tag needs adding before all the ?Class connections here.
 
 
 
 
 
'''Kimberly:'''We can ignore this ?GO_term model for now.
 
 
 
 
 
'''#(not ? as in proposal)GO_annotation_info hash:'''
 
  
1) Duplicate tag names in model are not allowed
+
#[[Jenkins Checks]]
  
<pre>
 
Annotation_extension Relation ?Text ?Life_stage                 #Evidence
 
                                              Relation ?Text ?Gene                         #Evidence
 
                                              Relation ?Text ?Database ?Database_field Text #Evidence
 
                                              Relation ?Text ?Anatomy_term #Evidence
 
                                              Relation ?Text ?GO_term                 #Evidence
 
</pre>
 
  
'''Kimberly:''' Do you mean that we can't have 'Relation' repeated?  In that case, we might need to then use tag names that are actually the text of the Relation, e.g. Part_of, Happens_during, Has_regulation_target, Has_direct_input, etc. ?
+
==Infectious Agents Used in C. elegans Papers==
  
  
2) Possible better external xref
+
{| border=1 cell-padding=5 cell-spacing=10
 +
|-
 +
! Pathogen Type
 +
! Gram Stain
 +
! Species
 +
! Strain
 +
! NCBI Taxon (strain-specific)
 +
! References
 +
! Comments
 +
|-
 +
| Bacterial
 +
| Positive
 +
| Bacillus thuringiensis
 +
|
 +
| 1428
 +
| WBPaper00034766
 +
|
 +
|-
 +
| Bacterial
 +
| Positive
 +
| Enterococcus faecalis
 +
| OG1RF
 +
| 474186
 +
| WBPaper00028945
 +
|
 +
|-
 +
| Bacterial
 +
| Negative
 +
| Erwinia carotovora
 +
|
 +
|
 +
| WBPaper00030985
 +
|
 +
|-
 +
| Bacterial
 +
| Negative
 +
| Photorhabdus luminescens
 +
|
 +
|
 +
| WBPaper0030985
 +
|
 +
|-
 +
| Bacterial
 +
| Negative
 +
| Pseudomonas aeruginosa
 +
| PA14
 +
| 652611
 +
| WBPaper00028945
 +
|
 +
|-
 +
| Bacterial
 +
|
 +
| Salmonella enterica serovar typhimurium
 +
| SL1344
 +
| 90371
 +
| WBPaper00028945
 +
|
 +
|-
 +
| Bacterial
 +
| Negative
 +
| Serratia marcescens
 +
| Db11
 +
| 273526
 +
|
 +
|
 +
|-
 +
| Bacterial
 +
| Negative
 +
| Shigella boydii
 +
|
 +
| 61
 +
| WBPaper00041339
 +
| Paper 41339 refers to an ATCC entry: http://www.atcc.org/products/all/9207.aspx#generalinformation that does not yet have an NCBI taxon ID.
 +
|-
 +
| Bacterial
 +
| Negative
 +
| Shigella flexneri
 +
|
 +
| 623
 +
| WBPaper00041339
 +
| Paper 41339 refers to an ATCC entry: http://www.atcc.org/products/all/12022.aspx that does not yet have an NCBI taxon ID.
 +
|-
 +
| Bacterial
 +
| Positive
 +
| Staphylococcus aureus
 +
|
 +
| 1280
 +
| WBPaper00028970
 +
|
 +
|-
 +
| Bacterial
 +
| Negative
 +
| Vibrio cholera
 +
| E7946
 +
| 686
 +
| WBPaper00041163
 +
| Taxon ID is a parental ID of the strain cited in the paper, since the Ogawa strains in the paper don't match the entries in NCBI.
 +
|-
 +
| Fungal
 +
|
 +
| Cryptococcus neoformans
 +
| H99
 +
|
 +
| WBPaper00028945
 +
| There are a whole bunch of more specific H99 strains in NCBI, so can't make a 1:1 strain: taxon mapping here.  For GO, could default just to general C. neoformans, 5207.
 +
|-
 +
|}
  
<pre>
+
===Progress Reports===
?GO_annotation_info Annotation_made_with ?Text        #Evidence  //This could be ID from another MOD, for example.
+
*[[New GO Progress Report Script]]
</pre>
+
*[[2018]]
  
could be
 
  
<pre>
 
?GO_annotation_info Annotation_made_with Database ?Database ?Database_field ?Text #Evidence //standard tag structure for holding external DB data?
 
</pre>
 
  
This has taken quite a long time to test as the model as far as I can tell hasn't been tested in acedb and the same goes for the test data :(
 
  
  
 
Back to [[Caltech documentation]]
 
Back to [[Caltech documentation]]
 
[[Category:Curation]]
 
[[Category:Curation]]

Latest revision as of 18:45, 13 May 2020

Contents

File Uploads

WormBase Uploads

UniProtKB gpad file to .ace file

This page documents how to go from the UniProtKB gpad file produced weekly to the WormBase .ace file for upload: UniProtKB gpad to WormBase .ace

Phenotype2GO uploads to postgres

This page documents how to add new Phenotype2GO-based annotations to postgres: Adding new Phenotype2GO annotations to postgres

Updating_go.ace_file

SOP_for_generating_GO_files_for_citace_and_GO_consortium_uploads

Specifications_for_WB_gpi_file

GO-CAM GPAD

GOC Uploads

Noctua - Upload of WB Manual Annotations

Manual Literature Curation

Noctua Models

Gastruation and Morphogenesis Modeling PMID:26412237

Reference Genome (see also Reference Genome Inferential Annotations)

  1. Lung Development Targets (November 2009 - February 2010)

Transcription-related re-annotation

This summarizes the annotations that may need to be revised due to changes in the GO's representation of transcription.

7 Molecular Function terms will be obsoleted. They are listed below with the number of manual elegans annotations associated:

  • GO:0003704 specific RNA polymerase II transcription factor activity - 8 (all Kimberly)
 ceh-24 - ISS - changed to  GO:0000981 sequence-specific DNA binding RNA polymerase II transcription factor activity 
 ceh-27 - ISS - same as above
 ceh-28 - ISS - same as above
 elt-1 - IDA - same as above
 elt-3 - IMP - for WBPaper00004593, removed MF term (no longer comfortable with IMP MF terms from this type of experiment); 
   also made corresponding BP term less granular, from positive regulation to just gene-specific transcription from pol II promoter
 elt-3 - IMP - same as for elt-3 above
 hlh-3 - IMP - for WBPaper00031977, removed MF term for same reason as above, also made BP term less granular as above
 zip-2 - IMP - for WBPaper00035891, same as above for elt-3 and hlh-3

Migration to UniProtKB Protein2GO Curation Tool

  1. UniProt-GOA syntax checking
  2. File Specifications for Downloading Manual Annotations for Protein2GO
  3. Scripts for file dumping and conversion
  4. GPAD to .ace file
  5. GPAD to .go file
  6. What's staying in postgres?

Semi-Automated Methods of Curation

Textpresso-Based Curation

GO Cellular Component Curation - MOD-Specific Pages
General specifications
dictyBase
FlyBase
TAIR (this is the older page no longer used)
TAIR_CCC
WormBase
GO Cellular Component Curation - General Issues
Processing Gene and Protein Names for Searches and Curation
Specifications for CCC Curation from Textpresso Search Page
CCC Form 2.0 Specifications
  • MFC - GO Molecular Function Curation using Textpresso
mf_hmm tool
in vitro flagging

Old Phenotype2GO pipeline (Sanger and Caltech)

  • The old Sanger script that generates the gene_association file (from Igor's work in January 2009) was changed. Instead of an exclusion list and 'include list' that comprises papers (mostly large scale genome-wide studies) is provided to the script. This list is curator approved and explicitly agreed upon for the propagation of GO terms to genes based on their RNAi phenotypes.
  • A new script is used, to use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
  • If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.
  • The old script: inherit_GO_terms.pl does not consult any exclusion/inclusion files. To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.
  • Current status:From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled. The script should be run without the '-variation' option, but the gene_association file still has those. Try this:

grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi This is now resolved.


InterPro2GO Mappings for IEA Annotations
Reference Genome Inferential Annotations

Software Development: Tools and Scripts

Gene - GO Curation Status

Reference Genome Reports - Annotation Coverage

Ontology Annotator - The GO annotation interface

Textpresso related forms

WormBase gene association file

Taxon Constraints

From Chris Mungall, 8/19/2011:

The taxon checks are run weekly, and the reports deposited here:

   http://www.geneontology.org/quality_control/annotation_checks/taxon_checks/

Note that this service will be subsumed into a more comprehensive annotation QC service (apologies if you weren't at the USC meeting, where this was demoed). This is, in general, the plan for many of the ad-hoc scripts and cron reports we perform now. I will send an email to the GOC list next week describing the roll-out process for this.

For the QC checks, the idea is to push the checking as far upstream as possible. A weekly report is too reactive. This could be done at the time of submission. Even better, the annotation tool could use the central web service at the time of annotation.

WormBase contributions to Gene Ontology content

2013
  • L-lysine transport
  • L-arginine transport
  • L-histidine transport
  • early endosome to recycling endosome transport
  • corrected definition of recycling endosome
2012
  • nematode larval development, heterochronic
  • regulation of nematode larval development, heterochronic
  • transforming growth factor receptor signaling pathway involved in multicellular organism growth
  • insulin receptor signaling pathway involved in determination of adult lifespan
  • positive, negative regulation of oviposition
  • pairing center
  • muscle projection, muscle projection membrane (narrow synonyms: myopodia, muscle arm)
  • regulation of synaptic plasticity by receptor localization to synapse
  • regulation of basement membrane organization
  • regulation of RNA interference
  • regulation, positive, negative of oocyte maturation
  • incorrect InterPro2GO mapping for IPR003131
  • dishabituation
  • double-stranded DNA-dependent ATPase activity
  • new representation of tail tip morphogenesis
  • synonym for nuclear inner membrane
  • change definition of apical junction complex
  • regulation, positive, negative of serine-type endopeptidase activity
  • regulation, positive, negative of neuromuscular synaptic transmission
2011
  • protein binding - SUMO conjugating enzyme
  • regulation of neuron migration
  • basement membrane assembly involved in embryonic body morphogenesis
  • parentage of dauer larval development - also include dormancy process
  • regulation of ATP biosynthetic process
  • regulation, positive, negative of dipeptide transport
  • regulation of phospholipid transport
  • regulation, positive, negative of endocytic recyling
  • suggested change to InterPro2GO mapping for GoLoco motif
  • GABAergic neuron differentiation
  • pre-mRNA binding
  • nitric oxide sensory activity
  • age-dependent behavioral decline
  • regulation, positive, negative of anterograde axon cargo transport and retrograde axon cargo transport
  • aggrephagy
  • germ cell proliferation
  • in progress - centrosome maturation - when, what, how
  • defecation motor program
  • modifications to terms and definitions of cilium assembly and sensory cilium assembly
  • ciliary transition zone
  • regulation and pos/neg regulation of microtubule motor activity
  • nickel ion homeostasis and cellular nickel ion homeostasis
  • neurotransmitter receptor catabolic process
2010
  • regulation of defecation, positive and negative children (2010)
  • mitochondrial prohibitin complex (2010)
  • cilium terms (2010, updates/revisions to terms added in 2005)
  • octapamine/tyramine signaling involved in the response to food (and the regulation terms) (2010)
  • alpha-tubulin acetylation (2010)
  • phagosome maturation involved in apoptotic cell clearance (2010)
  • phagosome acidification involved in apoptotic cell clearance(2010)
  • phagolysosome assembly involved in apoptotic cell clearance (2010)
  • phagosome-lysosome docking involved in apoptotic cell clearance (2010
  • phagosome-lysosome fusion involved in apoptotic cell clearance (2010)
  • neuropeptide receptor binding (2010)
  • striated muscle contraction involved in embryonic body morphogenesis (2010)
  • striated muscle myosin thick filament assembly (2010)
  • striated muscle paramyosin thick filament assembly (2010)
  • determination of left/right asymmetry in the nervous system (2010)
  • regulation of locomotion (including positive and negative regulation child terms) involved in locomotory behavior (2010)
  • detoxification of arsenic (2010)
  • chondroitin sulfate proteoglycan binding (2010)
  • chondroitin sulfate binding (2010)
  • regulation (includes positive and negative regulation child terms) of nematode larval development (2010)
  • regulation of (includes positive and negative regulation terms) dauer larval development (2010)
2009
  • response to drug withdrawel (2009)
  • phosphatidylserine exposure on apoptotic cell surface (2009)
2008
  • regulation of synaptic vesicle priming (2008)
  • chloride-activated potassium channel activity (2008)
  • transdifferentiation (2008)
  • Regulation of ovulation terms (2008)
  • Process terms for gap junction proteins (2008)
  • piRNA and 21U-RNA terms (2008)
2007
  • dense body (sensu Nematoda) cellular component term (2007)
  • GO:0000775, GO:0000779, GO:0000780
  • D/V and A/P axon guidance terms (2007)
  • palmitoyl-CoA 9-desaturase activity (2007)
  • response to hyperoxia (2007)
  • Cuticle component terms (2007)
  • response to anoxia (2007)
2006
  • dynein light intermediate chain binding (2006)
  • Regulation terms for cell and nuclear division (2006)
  • Several child terms for apoptosis (2006)
2005
  • Cilium terms (2005)
2004
  • Intraflagellar transport particle-component terms (2004)
  • oogenesis (non-species specific term)(2004)
Modifications to the Ontology
  • Revised definition for muscle homeostasis (2010)
  • Added dense core vesicle synonym to dense core granule (2010)
  • Updated definition and moved parentage for intraflagellar transport (2009)
  • Added lethargus as synonym for sleep (2008)
  • Change to the definitions of the component terms: GO:0000775, GO:0000779, GO:0000780 which refer to the centromeres or chromosome, pericentric region (2007)
  • Change to parent of tail tip morphogenesis (sensu Nematoda) (2006)
  • GO:0046536, dosage compensation complex definition (2006)

Annotation Practices

Cellular Component Annotations

If a protein contains a transmembrane domain, but expression experiments are not at sufficient resolution to show membrane localization, what annotation should we make?

Example: WBPaper00036024


WormBase use of Column 16

Column 16 refers to a column in the Gene Ontology's (GO) tab-delimited gene association file (gaf) that WormBase submits to the GO consortium on a regular basis.

Column 16 has been referred to as the Annotation Extension column in that it provides a placeholder for curation details that cannot be captured by a GO term alone, for example the substrate upon which an enzyme acts.

A number of different types of information could conceivably be entered into Column 16. The list below begins to document the potential use of Column 16 by WormBase curators with any additional information or questions that have arisen during the course of curation.

In the GAF, there will be an explicit relationship between the entity in Column 16 and the GO term. The annotation extension relations are viewable here:

http://www.geneontology.org/scratch/xps/go_annotation_extension_relations.obo

Column 16 curation at WormBase is just beginning and will likely be fleshed out more fully over the next few months.

In the Ontology Annotator, Column 16 data is being entered into the 'Xref to' field in the following format: Column 16: Xref ID


Biological Process Examples:


Translational Regulation

Example 1: sup-26 is annotated to GO:0017148, negative regulation of translation. The entry in Column 16 is the target of that regulation, tra-2.

In OA entry: Column 16: WB:WBGene00006605

[Typedef]

id: has_regulation_target

name: has_regulation_target

def: "Identifies a gene or gene product affected by a regulation BP or regulator MF." [GOC:mah]

comment: probably want to add one or two new subtypes that capture something about directness

domain: GO:0065007 ! biological regulation

range: TEMP:0000003 ! gene or gene product

is_a: OBO_REL:has_participant


mRNA Processing

Example: sup-12 mutations affect splicing of unc-60 transcripts.

Reference: WBPaper00024604


Defense Response

Example 1: lys-7 is required for defense response to Cryptococcus neoformans

In OA entry: Column 16: NCBI:192011 (a taxon ID)

Response to Terms

Example 1: daf-2 is shown to be involved in response to oxidative stress by treating animals with paraquat. WBPaper00005488

In OA entry: annotate to 'response to oxidative stress' using CHEBI:34905

Cell Fate Specification

Example 1: egl-38 is found to be required for cell fate specification in the male tail. WBPaper00002924

Could add a number of Anatomy Terms to Column 16 (not done yet).

Regulation of Protein Localization

Example 1: hmp-1 and jac-1;hmp-1 double mutants are shown to affect the distribution of HMR-1. WBPaper00005972

Added WBGene ID of HMR-1 to Column 16.


Molecular Function Examples:

Nucleic Acid Binding

Example 1: sup-26 is annotated to GO:0003730, mRNA 3'-UTR binding. The entry in Column 16 is the target of that binding, tra-2.

In OA entry: Column 16: WB:WBGene00006605


Cellular Component Examples:

So far, there are three types of extensions added to Cellular Component annotations:

  1. part_of(Anatomy_term) http://wiki.geneontology.org/index.php/Annotation_Extension_Relation:part_of
  2. exists_during(Lifestage) http://wiki.geneontology.org/index.php/Annotation_Extension_Relation:exists_during
  3. exists_during(GO Biological Process) http://wiki.geneontology.org/index.php/Annotation_Extension_Relation:exists_during

Plans/Projects in progress

?GO_annotation model

  • The GO annotation model is becomingly increasingly complex, capturing more annotation detail.
  • The proposed model below would create a ?GO_annotation class that includes tags for the following GO fields:
    • GO_code: the evidence code used for making the GO inference
    • Gene_rel: captures the explicit relation between a GO term and the gene product annotated, including 'NOT' relations
    • Annotation_made_with: entities captured in the GAF With/From column for populating identifiers used with certain GO codes such as IPI (Inferred from Physical Interaction), IGI (Inferred from Genetic Interaction), IC (Inferred from Curator), etc.
    • Annotation _extension: for capturing annotation context using additional ontologies or entities. Entries in the annotation extension column could be used to help construct the LEGO models of pathways and processes (see http://wiki.geneontology.org/index.php/LEGO_Model_Draft_Specification).
    • Annotation_isoform: used in cases where an experiment describes the activity of a specific protein isoform; currently this information is captured with a UniProtKB accession; we would like to convert these to WB WP IDs. See tra-1 annotations for an example.
    • Interacting_strain and Interacting_species: for capturing the strain or species when dual-taxon GO annotations are made, for example, to annotate gene products involved in host-pathogen interactions.
    • Reference: if a published paper is used as evidence.
    • GO_reference: to capture GO_REF IDs for annotations that do not cite a published paper, but rather a documented GO curation practice (used for ND annotations, for example see GO References
    • Contributed_by: Uses ?Analysis objects to incorporate GO annotations from other annotation groups such as the PAINT curation efforts, UniProt, IntAct, etc. with proper attribution for the annotation source.
Proposed new ?GO_annotation class for WS247
 ?GO_annotation Gene ?Gene XREF GO_annotation
                GO_term ?GO_term XREF GO_annotation
                GO_code ?GO_code
                Annotation_relation NOT 
                                    colocalizes_with 
                                    contributes_to
                                    enables
                                    involved_in
                                    part_of                                 
                Annotation_made_with Interacting_gene ?Gene                         //for IGI and IPI annotations 
                                     Inferred_from_GO_term ?GO_term
                                     Motif ?Motif
                                     RNAi_result ?RNAi 
                                     Variation ?Variation
                                     Phenotype ?Phenotype
                                     Database ?Database ?Database_field ?Text      //for ISS, IEA, IGI, and PAINT annotations
                Annotation_extension Life_stage_relation ?Text UNIQUE ?Life_stage
                                     Gene_relation ?Text UNIQUE ?Gene 
                                     Molecule_relation ?Text UNIQUE ?Molecule
                                     Anatomy_relation ?Text UNIQUE ?Anatomy_term
                                     GO_term_relation ?Text UNIQUE ?GO_term
                Annotation_isoform   ?Text ?Protein         //captures information when annotation subject is an isoform 
                Interacting_species  ?Species ?Strain       //dual-taxon annotations; always populate species; strain when possible
                Reference ?Paper XREF GO_annotation
                GO_reference ?Database ?Database_field ?Text  //for GO's internal reference IDs, e.g., GO_REF:0000014
                Contributed_by ?Analysis                      //to properly credit annotations from other sources
                Date_last_updated UNIQUE DateType

Future Plans

Since we are now sharing a GO curation tool with UniProt, in the annotation file we get back from them, we will have isoforms listed as Q9N5D6-2 and would like to be able to map them back to a specific WP: identifier.


?GO_term model (implemented 2013-06)

    ?GO_term Name ?Text
             Definition ?Text
             Term ?Text
             Synonym Broad ?Text
                     Exact ?Text
                     Narrow ?Text
                     Related ?Text
             Status UNIQUE Valid
                           Obsolete
             Type UNIQUE Biological_process
                         Cellular_component
                         Molecular_function
             Child Instance ?GO_term XREF Instance_of
                   Component ?GO_term XREF Component_of
             Parent Instance_of ?GO_term XREF Instance
                    Component_of ?GO_term XREF Component
             Attribute_of Cell ?Cell XREF GO_term 
                          Motif ?Motif  XREF GO_term   //Needed for InterPro2GO mapping.
                          Gene ?Gene XREF GO_term   //Annotated entity for manual annotations.
                          CDS ?CDS XREF GO_term   //Annotated entity for InterPro2GO and Phenotype2GO mappings.
                          Sequence ?Sequence XREF GO_term   //Still needed?
                          Transcript ?Transcript XREF GO_term   //Still needed?
                          Phenotype ?Phenotype XREF GO_term
                          Anatomy_term ?Anatomy_term XREF GO_term   
                          Homology_group ?Homology_group XREF GO_term   //Still needed?
                          Expr_pattern ?Expr_pattern XREF GO_term
                          Picture ?Picture XREF Cellular_component
                          Index Ancestor   ?GO_term      
                                Descendent ?GO_term      
             Version UNIQUE Text //SVN revision number


Expanded IEP Evidence Code

What kinds of experiments are used to inform selection of the IEP evidence code?

  • WBPaper00006024|PMID:12869585 - Figure 2E and 2F illustrate promoter reporter fusions showing increased expression under different stress conditions. This type of experiment assays promoter activity.
    • Possible evidence code: ECO:0000296 green fluorescent protein transcript localization evidence
    • Possible new evidence code: ECO:new green fluorescent protein transcript localization evidence used in manual assertion
  • WBPaper00006024|PMID:12869585 - Figure 5F and 5G illustrate a translational fusion showing increased expression under different stress conditions. The time frame of the response suggests that this is a posttranscriptional event, possibly the protein moving from the cytoplasm to the nucleus.
    • Possible evidence code: ECO:0000300 green fluorescent protein immunolocalization evidence
    • Possible new evidence code: ECO:new green fluorescent protein immunolocalization evidence used in manual assertion
  • WBPaper00026814|PMID:16166371 - Figure 1A. Experiment measures an increase in the modified (i.e., phosphorylated) form of PMK-1 in response to treatment with several different oxidative stresses, e.g., sodium aresenite, paraquat (superoxide), and t-butyl peroxide (hydrogen peroxide).
    • Possible evidence code: ECO:0000279 Western blot evidence used in manual assertion
    • Possible new evidence code: ECO:new Western blot evidence of protein modification used in manual assertion (submitted SourceForge item, 2013-06-04)

Platinum Gene Lists

Innate Immunity, Defense Response, MAPK Signaling Pathway

Progress Report 2011

Papers that use C. elegans GO Annotations

Papers that use GO - 2015

  • WBPaper00035429
    • The resulting fold-changes and p-values were then used for GOMiner [36] and Cytoscape [37,38] analyses.
    • Detailed descriptions of GOMiner and Cytoscape analyses are accessible as Supplemental methods.
    • Detailed GOMiner and Cytoscape (jActiveModules and BiNGO) analyses of strain-to-strain differences under both UV and control conditions can be found in Supplemental data files 2–5 (GOMiner) and 6–7 (BiNGO).
    • Differentially expressed transcripts (defined as absolute fold change value > 1.3, a log ratio p-value < 0.05 by Rosetta Resolver, and a log(10) intensity measurement > −0.4) along with fold-change, p-values and GO annotation are listed for each strain and year in Supplemental data file 8.
    • 3.9. Gene ontology (GO) biological processes altered 3 h post-UVC exposure
    • Since our list of UVC-regulated genes based on a minimal 1.3-fold change was relatively short (Table 1), we also carried out a jActiveModules network analysis. This algorithm can identify subnetworks highly enriched in regulated genes even if the fold-changes are not large, since it is based only on p-values [37].
    • 36. Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, Narasimhan S, Kane DW, Reinhold WC, Lababidi S, Bussey KJ, Riss J, Barrett JC, Weinstein JN. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol. 2003;4:R28. [PMC free article] [PubMed]
    • 37. Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002;18 Suppl 1:S233–S240. [PubMed]
  • WBPaper00040880 - The 662 genes associated with the modification of 128Q-neuron dysfunction appeared to encompass a variety of biological processes (cell death, protein folding, intracellular transport, metabolic processes, response to stress, stress-activated pathways) that may have a role in neurodegenerative disease pathogenesis as suggested by their functional classification using GO annotations (Figure 3, Additional file 7: Tables S6; Additional file 8: Table S7).
    • GO enrichment tests were performed using Ontologizer v2.0.
    • Bauer S, Grossmann S, Vingron M, Robinson PN : Ontologizer 2.0--a multifunctional tool for GO term enrichment analysis and data exploration.
    • In this respect, the network-boosted data analysis of our RNAi dataset was more instructive compared to the sole use of GO annotations or gene set enrichment analysis.
  • WBPaper00040998
    • To identify pathways and molecular functions common to the genes observed by microarray analysis, we employed the gene ontology (GO) enrichment analysis using GOrilla [21]. As expected due to NHR-49’s known role in lipid biology, there was a significant overrepresentation of GO-terms for functions related to fat metabolism (Figure 2 and Table 2). We also found that pathways regulating protein processing, maturation and proteolysis were overrepresented.
    • Figure 2. Functional classification summary for the nhr-49 mutant are represented as a scatter plot using the GO visualization tool REViGO.
    • Gene ontology (GO) enrichment analysis was performed using GOrilla [21]. Each list from the limma analysis was ranked from smallest to largest p-value and analyzed for enriched biological process ontology terms found near the top of the list. Functional classification summary for the nhr-49 mutant were presented as a scatter plot using the GO visualization tool REViGO [51].
    • Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z (2009) GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10: 48.
    • Supek F, Boˇsnjak M, Sˇkunca N, Sˇmuc T (2011) REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms. PLoS ONE 6: e21800. doi:10.1371/journal.pone.0021800.
  • WBPaper00041080
  • WBPaper00041771
    • Gene ontology classification was acquired using the Database for Annotation, Visualization, and Integrated Discovery (DAVID, http://david.abcc.ncifcrf.gov/).
    • DAVID identified many biological themes among our list of STAU-1 targets. These included embryonic, larval, and reproductive development (Fig. 5B and Supplemental Dataset 1). These are consistent with Staufen’s previously characterized role in developmental patterning in Drosophila oocytes and embryos (32,33,62).
    • In addition, major GO terms associated with the human Staufen targets include cellular metabolism and cellular processes (42), and are not similar to the GO terms associated with C. elegans STAU-1 targets. We note that our studies analyzed STAU-1- associated RNAs in whole animals containing a wide array of cell types, whereas the human proteins were analyzed in a cultured cell line. Thus the biological meaning of the apparent differences in the targets of the human and worm proteins is uncertain.
  • WBPaper00042178
    • We used the Database for Annotation, Visualization and Integrated Discovery (DAVID), version 6.7, to cluster related target genes based on enriched Gene Ontology (GO) terms [39,40]
    • 39. Huang da W, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic acids research 37: 1–13.
    • 40. Huang da W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols 4: 44–57.
  • WBPaper00056066
    • We then used the DAVID functional annotation program to identify biologic themes within the common up‐ and down‐regulated genes.

GO Uploads

SOP for generating GO files for citace and GO consortium uploads

Problems with GO data in WormBase (as of June 2013)

Data and display problems of GO annotations in WormBase


Jenkins Validation Checks

  1. Jenkins Checks


Infectious Agents Used in C. elegans Papers

Pathogen Type Gram Stain Species Strain NCBI Taxon (strain-specific) References Comments
Bacterial Positive Bacillus thuringiensis 1428 WBPaper00034766
Bacterial Positive Enterococcus faecalis OG1RF 474186 WBPaper00028945
Bacterial Negative Erwinia carotovora WBPaper00030985
Bacterial Negative Photorhabdus luminescens WBPaper0030985
Bacterial Negative Pseudomonas aeruginosa PA14 652611 WBPaper00028945
Bacterial Salmonella enterica serovar typhimurium SL1344 90371 WBPaper00028945
Bacterial Negative Serratia marcescens Db11 273526
Bacterial Negative Shigella boydii 61 WBPaper00041339 Paper 41339 refers to an ATCC entry: http://www.atcc.org/products/all/9207.aspx#generalinformation that does not yet have an NCBI taxon ID.
Bacterial Negative Shigella flexneri 623 WBPaper00041339 Paper 41339 refers to an ATCC entry: http://www.atcc.org/products/all/12022.aspx that does not yet have an NCBI taxon ID.
Bacterial Positive Staphylococcus aureus 1280 WBPaper00028970
Bacterial Negative Vibrio cholera E7946 686 WBPaper00041163 Taxon ID is a parental ID of the strain cited in the paper, since the Ogawa strains in the paper don't match the entries in NCBI.
Fungal Cryptococcus neoformans H99 WBPaper00028945 There are a whole bunch of more specific H99 strains in NCBI, so can't make a 1:1 strain: taxon mapping here. For GO, could default just to general C. neoformans, 5207.

Progress Reports



Back to Caltech documentation