Updating ontology (.obo) files for the OA
This obo file is used to display term info for variations, transgenes, strains, clones, rearrangements, genes in any OA interface that contains these objects. It needs to be updated with every release of acedb. The obo file is created from AQL queries of the latest WS.
In the Phenotype OA, all object fields, except strain should be autocomplete drop down lists. The files that are used to populate these fields are an obo-like format in that there is information attached to each object that shows up in the term info box when selected. Keeping the file updated from acedb and showing this information in the term info box helps during curation as it verifies the identity of the object being curated and saves the curator time from having to manually look up and verify the info themselves. These files although not technically 'obo files' will be referred to as obo files when referring to any flat file that contains a list of terms with accompanying information for display in the term info window. This is in contrast to other flat files that only contain a simple list of terms.
obo files for the phenotype OA
The following fields use an obo file, the name, source and script that generates the obo file used is noted.
- Pub field -> paper.obo
- Person field
- Variation ->obo_name_variation; obo_data_variation
- Transgene ->trp tables
- Rearrangement ->obo_oa_ontology
- Caused by -> WBGene
- Phenotype ->phenotype.obo
- Molecule ->molecule.obo
- Anatomy ->WBbt.obo
- Life stage ->worm_development.obo
- Child of ->phenotype.obo
- Laboratory evidence
- Entity ->chebi.obo, rex.obo, gene_ontology_ext.obo
- Quality ->quality.obo
obo OA ontologies
See Ontology update pipeline here for variation, clone, strain, rearrangement, and laboratory obo table, basically these files are updated from a nightly geneace dump.
Transgenes should be updated directly from the trp tables.
Variations are also updated from the
All other ontologies in the phenotype OA should be updated through the following script:
obo tables are populated by downloading :
Phenotype term: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/phenotype_ontology_obo.cgi
Molecule ChEBI: ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo
Gene ontology: http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo
Life stage: http://www.berkeleybop.org/ontologies/obo-all/worm_development/worm_development.obo
PATO entity: http://www.berkeleybop.org/ontologies/obo-all/rex/rex.obo
PATO quality: http://www.berkeleybop.org/ontologies/obo-all/quality/quality.obo
NOTE: Across all ontologies all "is_obsolete: true" are changed to be red.
Everyday this script calls and incremental update from the variation nameserver:
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo Note: this was changed from http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo
On the 1st of the month the script calls a full variation obo update:
--kjy 22:33, 4 September 2013 (UTC)
Repopulating variation OA obo from geneace
For each variation with specific Method (listed in table below), the following information will be retrieved:
- WBVar ID
- gene association
Only variations with one of the following attached methods are retrieved. Allele Deletion_allele Deletion_and_insertion_allele Deletion_polymorphism Insertion_allele Insertion_polymorhism KO_consortium_allele Mos_insertion NBP_knockout_allele NemaGENETAG_consortium_allele Substitution_allele Transposon_insertion Engineered_allele
These data will populate /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace files
in obo_name_variation, entries are like:
WBVar00000020 ad487 2015-11-30 20:01:08
in obo_data_variation, entries are like:
WBVar00088136 id: WBVar00088136\nname: "ju2"\nspecies: "Caenorhabditis elegans"\nstatus: "Live"\ngene: "WBGene00006363 syd-1"\nreference: "WBPaper00005543" 2015-11-30 20:01:08
If a variation does not exist in the geneace dump, and hence not in obo_name/data_variation tables
- retrieve a WBVarID from the variation nameserver at http://www.sanger.ac.uk/sanger/Worm_NameServer (you will need a login and password, which may take a while to be assigned)
- enter the public name and WBVarId, separated by a space OR tab, into the TempVariationObo http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo * * note: The form can take columns of data as long as it is in the format of <allele public name> <WBVarID>.
The information is added immediately to the obo_name_variation and should be available through the OA variation field (a form reload may be necessary).
If the allele already has a WBVarID but does not exist in the nightly geneace dump, curators should still enter the object through the generic.cgi. (http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo)
When the object comes through during the geneace dump, its information will be captured and overwritten in obo_data_variation.
For variations that already exist in geneace but do not come through the geneace dump, e.g., million mutation varations, etc., they will remain in obo_data_variation and still be available through the drop down.
Obsolete-> To check if the re-population scripts worked, check out the WS_current info field. The date will tell you when it was last updated; it should reflect the date the script was run.
Obsolete -> Updating variation obo This script looks at the files in:
home/acedb/jolene/WS_AQL_queries Variation_gene.txt total_variations.txt clone_info.txt strains.txt rearr_simple.txt expr_cluster.txt
and the new Gene Class file to
populate the obo_*_geneclass tables.
To launch the full update manually log in to tazendra as acedb and do
Because it takes so long, make sure to wait until people do not require the ontologies to work in the OA.
Two scripts run off of these files to update the .obo's for the OA. The scripts are on tazendra and run off of files Variation_gene.txt, (transgene_summary_reference.txt-obsolete) and rearr_simple.txt. So files need to be transferred to tazendra and renamed to be recognizable by those scripts. Transfer files to tazendra: scp all files to email@example.com:/home/acedb/jolene/WS_AQL_queries
Obsolete ->Cron job: populate_newobjects_cgi_postgres_tables.pl updates information based on Variation_gene.txt (and transgene_summary.txt). This script is required for posting new allele or transgene entries on to the New objects cgi and sending notifications to the relevant curators. (Make sure files are named accordingly or the program won’t see them).
AQL Queries for updating the variation obo
Instructions for retrieving object connections from the latest WS build Grab latest build from spica.
Variation-gene, variation-paper connections are used to select for variations of type allele or those that affect CDS (can include transposon and polymorphism alleles). This is necessary to keep the size of the variation autocomplete file manageable and not slow down the loading and using of the OA.
Find all variations in the allele group (excludes SNPs etc.) along with the WBGeneID and public gene name of the gene they are assigned, if that is available. You will be making a file named Variation_gene.txt' that is a combination of vargene.txt and transposons.txt
select a, a->gene, a->gene->public_name, a->reference from a in class variation where exists_tag a->allele Export as vargene.txt (choose Separator character set to blank (TAB))
select t, t->gene, t->gene->public_name, t>reference from t in class variation where exists_tag t->transposon_insertion and exists t->gene Export as above as transposons.txt
- Make Variation_gene.txt by copying and pasting transposons.txt to the end of vargene.txt and saving as Variation_gene.txt"
select a from a in class variation where exists a->variation_type\\ Export as total_variations.txt as above//This is required for building an exclusion list that filters out SNPs, and is referred to as a junk list
List transgenes already linked to a paper
select t, t->reference, t->summary from t in class transgene where exists t->reference Export as transgene_summary.txt
List rearrangements with LG, 'genes inside' and ‘gene outside’ (public names only)
select r, r->map, r->gene_inside->public_name, r->gene_outside->public_name from r in class rearrangement\\ Export as rearr_simple.txt
Added 5.17.11 for Daniela
select s, s->genotype, s->location from s in class strain
Added 5.17.11 for Daniela
select a, a->Type, a->transgene, a->strain, a->general_remark, a->location, a->accession_number, a->reference from a in class clone