Difference between revisions of "Updating ontology (.obo) files for the OA"
m |
|||
Line 26: | Line 26: | ||
obo tables are populated by downloading : | obo tables are populated by downloading : | ||
− | http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/phenotype_ontology_obo.cgi<br> | + | Phenotype term: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/phenotype_ontology_obo.cgi<br> |
− | ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo<br> | + | Molecule ChEBI: ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo<br> |
− | http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo<br> | + | Gene ontology: http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo<br> |
− | http://obo.cvs.sourceforge.net/*checkout*/obo/obo/ontology/anatomy/gross_anatomy/animal_gross_anatomy/worm/worm_anatomy/WBbt.obo<br> | + | Anatomy: http://obo.cvs.sourceforge.net/*checkout*/obo/obo/ontology/anatomy/gross_anatomy/animal_gross_anatomy/worm/worm_anatomy/WBbt.obo<br> |
− | http://www.berkeleybop.org/ontologies/obo-all/worm_development/worm_development.obo<br> | + | Life stage: http://www.berkeleybop.org/ontologies/obo-all/worm_development/worm_development.obo<br> |
− | http://www.berkeleybop.org/ontologies/obo-all/rex/rex.obo<br> | + | PATO entity: http://www.berkeleybop.org/ontologies/obo-all/rex/rex.obo<br> |
− | http://www.berkeleybop.org/ontologies/obo-all/quality/quality.obo | + | PATO quality: http://www.berkeleybop.org/ontologies/obo-all/quality/quality.obo<br> |
+ | |||
+ | Across all ontologies the script changes all "is_obsolete: true" to be red.<br> | ||
It also calls :<br> | It also calls :<br> | ||
− | http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo | + | Incremental update: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo |
For the variation obo files: | For the variation obo files: | ||
On the 1st of the month a full variation obo update is run. | On the 1st of the month a full variation obo update is run. | ||
− | + | Full update: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=UpdateVariationObo | |
+ | This script looks at the files in home/acedb/jolene/WS_AQL_queries and the new Gene Class file to | ||
+ | populate the obo_*_geneclass tables. | ||
Any other day a short version of the update is run (only looks at the variation nameserver) | Any other day a short version of the update is run (only looks at the variation nameserver) | ||
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo | http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo | ||
− | To launch the update manually log in to tazendra as acedb and do | + | To launch the full update manually log in to tazendra as acedb and do |
wget "http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=UpdateVariationObo" | wget "http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=UpdateVariationObo" | ||
− | Because it takes so long, make sure to wait until people | + | Because it takes so long, make sure to wait until people do not require the ontologies to work in the OA. |
− | --[[User:Kyook|kjy]] 22: | + | --[[User:Kyook|kjy]] 22:33, 4 September 2013 (UTC) |
=AQL Queries for updating the variation obo= | =AQL Queries for updating the variation obo= |
Revision as of 22:33, 4 September 2013
This obo file is used to display term info for variations, transgenes, strains, clones, rearrangements, genes in any OA interface that contains these objects. It needs to be updated with every release of acedb. The obo file is created from AQL queries of the latest WS.
In the Phenotype OA, all object fields, except strain should be autocomplete drop down lists. The files that are used to populate these fields are an obo-like format in that there is information attached to each object that shows up in the term info box when selected. Keeping the file updated from acedb and showing this information in the term info box helps during curation as it verifies the identity of the object being curated and saves the curator time from having to manually look up and verify the info themselves. These files although not technically 'obo files' will be referred to as obo files when referring to any flat file that contains a list of terms with accompanying information for display in the term info window. This is in contrast to other flat files that only contain a simple list of terms.
Contents
obo files for the phenotype OA
The following fields use an obo file, the name, source and script that generates the obo file used is noted.
- Pub field -> paper.obo
- Person field
- Variation ->obo_oa_ontology
- Transgene ->obo_oa_ontology
- Rearrangement ->obo_oa_ontology
- Caused by -> WBGene
- Phenotype ->phenotype.obo
- Molecule ->molecule.obo
- Anatomy ->WBbt.obo
- Life stage ->worm_development.obo
- Child of ->phenotype.obo
- Laboratory evidence
- Entity ->chebi.obo, rex.obo, gene_ontology_ext.obo
- Quality ->quality.obo
obo OA ontologies
This script runs daily at 3am and populates obo tables
/home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl
obo tables are populated by downloading :
Phenotype term: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/phenotype_ontology_obo.cgi
Molecule ChEBI: ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo
Gene ontology: http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo
Anatomy: http://obo.cvs.sourceforge.net/*checkout*/obo/obo/ontology/anatomy/gross_anatomy/animal_gross_anatomy/worm/worm_anatomy/WBbt.obo
Life stage: http://www.berkeleybop.org/ontologies/obo-all/worm_development/worm_development.obo
PATO entity: http://www.berkeleybop.org/ontologies/obo-all/rex/rex.obo
PATO quality: http://www.berkeleybop.org/ontologies/obo-all/quality/quality.obo
Across all ontologies the script changes all "is_obsolete: true" to be red.
It also calls :
Incremental update: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo
For the variation obo files: On the 1st of the month a full variation obo update is run. Full update: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=UpdateVariationObo This script looks at the files in home/acedb/jolene/WS_AQL_queries and the new Gene Class file to populate the obo_*_geneclass tables.
Any other day a short version of the update is run (only looks at the variation nameserver)
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo
To launch the full update manually log in to tazendra as acedb and do
wget "http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=UpdateVariationObo"
Because it takes so long, make sure to wait until people do not require the ontologies to work in the OA. --kjy 22:33, 4 September 2013 (UTC)
AQL Queries for updating the variation obo
Instructions for retrieving object connections from the latest WS build Grab latest build from spica.
Variation-gene, variation-paper connections are used to select for variations of type allele or those that affect CDS (can include transposon and polymorphism alleles). This is necessary to keep the size of the variation autocomplete file manageable and not slow down the loading and using of the OA.
Variation_gene connections
Find all variations in the allele group (excludes SNPs etc.) along with the WBGeneID and public gene name of the gene they are assigned, if that is available. You will be making a file named Variation_gene.txt' that is a combination of vargene.txt and transposons.txt
- vargene.txt
select a, a->gene, a->gene->public_name, a->reference from a in class variation where exists_tag a->allele Export as vargene.txt (choose Separator character set to blank (TAB))
- transposons.txt
select t, t->gene, t->gene->public_name, t>reference from t in class variation where exists_tag t->transposon_insertion and exists t->gene Export as above as transposons.txt
- Make Variation_gene.txt by copying and pasting transposons.txt to the end of vargene.txt and saving as Variation_gene.txt"
- total_variations.txt
select a from a in class variation where exists a->variation_type\\ Export as total_variations.txt as above//This is required for building an exclusion list that filters out SNPs, and is referred to as a junk list
Obsolete- Transgene_summary_paper connections
List transgenes already linked to a paper
- transgene_summary.txt
select t, t->reference, t->summary from t in class transgene where exists t->reference Export as transgene_summary.txt
Rearrangement_inside_gene connections
List rearrangements with LG, 'genes inside' and ‘gene outside’ (public names only)
- rearr_simple.txt
select r, r->map, r->gene_inside->public_name, r->gene_outside->public_name from r in class rearrangement\\ Export as rearr_simple.txt
Strain info
Added 5.17.11 for Daniela
- strains.txt
select s, s->genotype, s->location from s in class strain
Clone info
Added 5.17.11 for Daniela
- clone_info.txt
select a, a->Type, a->transgene, a->strain, a->general_remark, a->location, a->accession_number, a->reference from a in class clone
Repopulating .obo's
Two scripts run off of these files to update the .obo's for the OA. The scripts are on tazendra and run off of files Variation_gene.txt, (transgene_summary_reference.txt-obsolete) and rearr_simple.txt. So files need to be transferred to tazendra and renamed to be recognizable by those scripts.
Transfer files to tazendra
scp all files to acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries
Scripts for repopulating the OA obo files
update_obo_oa_ontologies.pl /home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl
This script has two states, incremental versus full update.
- To launch the incremental update (just adds latest objects to the tables)
- auto - daily cron job at 8pm when /home/acedb/jolene/WS_AQL_queries/full_run is set to NO
- manual - launch with http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo
- To launch the full update (a full rewrite of the tables => ~8 hours).
- auto - daily cron job at 8pm when /home/acedb/jolene/WS_AQL_queries/full_run is set to YES (when script runs, it resets it to NO at the end of the script so if script crashes during the middle of the run, the file will still be at YES)
- manual - log on to tazendra and wget "http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=UpdateVariationObo" Note: do not forget the quotes.
- generic.cgi?action=AddToVariationObo Run to do an 'on-the-fly' update to the variation list (quick <40 sec) http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo ->Launch this script to add variations that were recently created on the variation name server.
- Cron job: populate_newobjects_cgi_postgres_tables.pl updates information based on Variation_gene.txt (and transgene_summary.txt ->obsolete). This script is required for posting new allele (and transgene) entries on to the New objects cgi and sending notifications to the relevant curators. (Make sure files are named accordingly or the program won’t see them).
Obsolete-> To check if the re-population scripts worked, check out the WS_current info field. The date will tell you when it was last updated; it should reflect the date the script was run.
--kjy
/home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl