Difference between revisions of "Updating ontology (.obo) files for the OA"

From WormBaseWiki
Jump to navigationJump to search
m
Line 1: Line 1:
This SOP is for updating the Object names obo files in the OA
+
This information is for updating the object names obo files used for the phenotype OA a well as other OA interfaces.
  
 +
In the Phenotype OA, all object fields, except strain should be autocomplete drop down lists.  The files that are used to populate these fields are an obo-like format in that there is information attached to each object that shows up in the term info box when selected.  Keeping the file updated from acedb and showing this information in the term info box helps during curation as it verifies the identity of the object being curated and saves the curator time from having to manually look up and verify the info themselves.  These files although not technically 'obo files' will be referred to as obo files when referring to any flat file that contains a list of terms with accompanying information for display in the term info window.  This is in contrast to other flat files that only contain a simple list of terms.
  
= Updating local acedb to latest available WS (instructions from Wen) =
+
====obo files for the phenotype OA====
 +
The following fields use an obo file, the name, source and script that generates the obo file used is noted.
 +
*Pub field -> paper.obo
 +
*Person field
 +
*Variation ->obo_oa_ontology
 +
*Transgene ->obo_oa_ontology
 +
*Rearrangement ->obo_oa_ontology
 +
*Caused by -> WBGene
 +
*Phenotype ->phenotype.obo
 +
*Molecule ->molecule.obo
 +
*Anatomy ->WBbt.obo
 +
*Life stage ->worm_development.obo
 +
*Child of ->phenotype.obo
 +
*Laboratory evidence
 +
*Entity ->chebi.obo, rex.obo, gene_ontology_ext.obo
 +
*Quality ->quality.obo
  
You have to download the latest build from the Sanger website. From the command line (X11 on Mac OS), go to the local directory where the old WS is installed. Login anonymously to Sanger’s ftp site
 
  
bash-3.2$ ftp ftp.sanger.ac.uk
+
===obo OA ontologies===
Connected to ftpservice2.sanger.ac.uk.
+
This script : /home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl
220-ftp.sanger.ac.uk NcFTPd Server (free educational license) ready.
 
220-Wellcome Trust Sanger Institute FTP server
 
220-
 
220-Problems after login? Try using '-' as the first character of you
 
220-password.
 
220-
 
220-****
 
220-****
 
220-**** 7/9/06 FTP Server upgraded please report any problems to
 
220-****    ftpadmin@sanger.ac.uk
 
220-****
 
220
 
Name (ftp.sanger.ac.uk:Yook): anonymous
 
331 Guest login ok, send your complete e-mail address as password.
 
Password:
 
  
Go to directory containing WS releases Download whole release (takes about 1 hour) Quit ftp
+
Is on a cron job every day at 3am. It populates obo tables by downloading :
  
FTP> cd pub/wormbase
+
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/phenotype_ontology_obo.cgi
FTP> get WS188.tar
+
ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo
[or get –R WS188 for Ncfp client]
+
http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo
FTP> bye
+
http://obo.cvs.sourceforge.net/*checkout*/obo/obo/ontology/anatomy/gross_anatomy/animal_gross_anatomy/worm/worm_anatomy/WBbt.obo
 +
http://www.berkeleybop.org/ontologies/obo-all/worm_development/worm_development.obo
 +
http://www.berkeleybop.org/ontologies/obo-all/rex/rex.obo
 +
http://www.berkeleybop.org/ontologies/obo-all/quality/quality.obo (do we even use entity / quality in the phenotype OA ?)
 +
And it also calls :
  
Unzip tar Get into the new WS directory and run install to install new database (~15 minutes)
+
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo
 
 
$ tar –xvvf WS188.tar
 
$ cd WS188
 
$./INSTALL
 
 
 
The readout after installation is as follows:
 
 
 
ACEDB installation script:
 
Yook will be known as the acedb-administrator
 
We are going to install the acedb system in the present directory:  
 
    /Users/Yook/WS_latest/WS188
 
This is your available disk space in this directory:
 
Filesystem  1024-blocks      Used Available Capacity  Mounted on
 
/dev/disk0s2  488050672 149925748 337868924    31%    /
 
The amount of space you need will depend on what data you are installing.
 
For the source code and binary, you need around 15 Mb.
 
Should we proceed? Please answer yes/no : yes
 
 
 
Exchange newest release with older one by removing old release and or change ./xace launch path to new release etc. $ rm <old WS release>
 
 
 
=AQL Queries for updating the variation file for the OA=
 
  
 +
=AQL Queries for updating the variation obo=
 
Instructions for retrieving object connections from the latest WS build  
 
Instructions for retrieving object connections from the latest WS build  
  
Variation-gene, variation-paper connections are used to filter out the variation term drop down list to keep the size of the autocomplete file manageable.  The total variation object class is filtered for those variations that are called 'allele's' or have a gene connection (which includes transposon variations). The files used for filtering this list needs to be updated with each release. To update this info, the following AQL queries need to be performed on the newest release and deposited on tazendra for the updating scripts.   
+
Variation-gene, variation-paper connections are used to filter out the variation term drop down list to keep the size of the autocomplete file manageable.  The total variation object class is filtered for those variations that are called 'alleles' or have a gene connection (which includes transposon variations). The files used for filtering this list need to be updated with each new release. To update this info, the following AQL queries need to be performed on the newest release and deposited on tazendra for the updating scripts.   
  
  
 
===Variation_gene connections===
 
===Variation_gene connections===
 
''Find all variations in the allele group (excludes SNPs etc.) along with the WBGeneID and public gene name of the gene they are assigned, if that is available.''
 
''Find all variations in the allele group (excludes SNPs etc.) along with the WBGeneID and public gene name of the gene they are assigned, if that is available.''
You will be making a file named '''WS200_vargene.txt''' that is a combination of ''WS200_vargene0.txt'' and ''WS200_transposons.txt''
+
You will be making a file named ''Variation_gene.txt''' that is a combination of ''vargene.txt'' and ''transposons.txt''
  
* ''WS200_vargene.txt''  
+
* ''vargene.txt''  
  select g, g->gene, g->gene->public_name, g->reference from g in class variation where exists_tag g->allele  
+
  select a, a->gene, a->gene->public_name, a->reference from a in class variation where exists_tag a->allele  
  Export as WS200_vargene.txt to your desktop (choose Separator character set to blank (TAB))
+
  Export as vargene.txt (choose Separator character set to blank (TAB))
* ''WS200_transposons.txt''  
+
* ''transposons.txt''  
  select v, v->gene, v->gene->public_name, v->reference from v in class variation where exists_tag v->transposon_insertion and exists v->gene
+
  select t, t->gene, t->gene->public_name, t>reference from t in class variation where exists_tag t->transposon_insertion and exists t->gene
  Export as WS200_transposons.txt to your desktop (choose Separator character set to blank (TAB))
+
  Export as above as transposons.txt
* Make '''WS200_vargene.txt''' by copying and pasting ''WS200_transposons.txt'' to the end of ''WS200_vargene.txt'' and saving as ''WS200_vargene.txt''
+
* Make '''Variation_gene.txt''' by copying and pasting ''transposons.txt'' to the end of ''vargene.txt'' and saving as ''Variation_gene.txt"
 
* ''total_variations.txt''
 
* ''total_variations.txt''
 
  select v, v->gene, v->gene->public_name, v->reference from v in class variation
 
  select v, v->gene, v->gene->public_name, v->reference from v in class variation
  Export as total_variations.txt to your desktop (choose Separator character set to blank (TAB))
+
  Export as total_variations.txt as above
//This is required for building an exclusion list that filters out SNPs  
+
//This is required for building an exclusion list that filters out SNPs, and is referred to as a junk list
  
 
===Transgene_summary_paper connections===
 
===Transgene_summary_paper connections===
 
''List transgenes already linked to a paper''
 
''List transgenes already linked to a paper''
* '''WS200_transpapsum.txt'''  
+
* '''transpapsum.txt'''  
 
  select t, t->reference, t->summary from t in class transgene where exists t->reference
 
  select t, t->reference, t->summary from t in class transgene where exists t->reference
  Export as WS200_transpapsum.txt to your desktop (choose Separator character set to blank (TAB))
+
  Export as transpapsum.txt
  
 
===Rearrangement_inside_gene connections===
 
===Rearrangement_inside_gene connections===
 
''List rearrangements with LG, 'genes inside' and ‘gene outside’ (public names only)''
 
''List rearrangements with LG, 'genes inside' and ‘gene outside’ (public names only)''
* '''WS200_rearragene.txt'''
+
* '''rearragene.txt'''
 
  select r, r->map, r->gene_inside->public_name, r->gene_outside->public_name from r in class rearrangement
 
  select r, r->map, r->gene_inside->public_name, r->gene_outside->public_name from r in class rearrangement
  Export as WS200_rearragene.txt your desktop (choose Separator character set to blank (TAB))
+
  Export as rearragene.txt  
 
 
  
 
----
 
----
Line 95: Line 78:
  
 
===Transfer files to tazendra===
 
===Transfer files to tazendra===
From within the directory that contains the files you just downloaded
+
scp all files to ''acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries''
send files to ''acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries''
 
 
 
*Send and '''Rename''' ''WS200_vargene.txt'' to ''Variation_gene.txt''
 
$ scp WS200_vargene.txt acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries/Variation_gene.txt
 
  
*Send and '''Rename''' ''WS200_rearragene.txt'' to ''rearr_simple.txt''
 
$ scp WS200_rearragene.txt acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries/rearr_simple.txt
 
  
*Send and '''Rename''' ''WS200_transpapsum'' to ''transgene_summary_reference.txt''
+
===Scripts for repopulating the OA obo files===
$ scp WS200_transpapsum.txt acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries/transgene_summary.txt
+
* Repopulates the OA autocomplete file, which is a full rewrite of the tables= >50 minutes.  This script runs on a cron job at 3am every morning and shouldn't be run manually if you can help it. http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=UpdateVariationObo
  
$ scp total_variations.txt acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries/total_variations.txt
 
  
===Scripts that repopulate the OA '.obo' files===
+
* 'on-the-fly' update to the variation list (quick <40 sec) http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo  Launch this script to add variations that were recently created on the variation name server.  
* Repopulates the OA autocomplete file (full rewrite of the tables= >50 minutes) http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=UpdateVariationObo
 
 
 
* 'on-the-fly' update to the variation list (quick <40 sec) http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo  Use this one if you just want to add variations that were recently created on the variation name server.  
 
  
 
* Cron job: ''populate_newobjects_cgi_postgres_tables.pl''  updates information based on Variation_gene.txt and transgene_summary.txt. This script is required for posting new allele and transgene entries on to the New objects cgi and sending notifications to the relevant curators. (Make sure files are named accordingly or the program won’t see them).
 
* Cron job: ''populate_newobjects_cgi_postgres_tables.pl''  updates information based on Variation_gene.txt and transgene_summary.txt. This script is required for posting new allele and transgene entries on to the New objects cgi and sending notifications to the relevant curators. (Make sure files are named accordingly or the program won’t see them).
  
===Obsolete===
 
Note: As of 6/10/2010, we have discontinued the use of the 'New Variation!' object cgi page
 
 
''make_obo.pl'' creates a text .obo based on rearr_simple.txt and Variation_gene.txt. This script used to populate the WS current info (which was needed for the Term info Display in the phenote).  now obsolete and replaced by a web launchable script http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo
 
 
 
Both apps are on tazendra in the same directory as the updated variation info.
 
 
cd to /home/acedb/jolene/WS_AQL_queries/
 
$ ./populate_newobjects_cgi_postgres_tables.pl
 
$ ./make_obo.pl
 
 
 
-Jolene
 
  
 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NOTE: populate_gin_variation updates data based on variation_tab_wbgene
 
file (in postgres / cgi) , which is no longer current.
 
  
 
To check if the re-population scripts worked, check out the [http://tazendra.caltech.edu/~azurebrd/var/work/phenote/ws_current.obo WS_current] info field  
 
To check if the re-population scripts worked, check out the [http://tazendra.caltech.edu/~azurebrd/var/work/phenote/ws_current.obo WS_current] info field  

Revision as of 06:34, 16 May 2011

This information is for updating the object names obo files used for the phenotype OA a well as other OA interfaces.

In the Phenotype OA, all object fields, except strain should be autocomplete drop down lists. The files that are used to populate these fields are an obo-like format in that there is information attached to each object that shows up in the term info box when selected. Keeping the file updated from acedb and showing this information in the term info box helps during curation as it verifies the identity of the object being curated and saves the curator time from having to manually look up and verify the info themselves. These files although not technically 'obo files' will be referred to as obo files when referring to any flat file that contains a list of terms with accompanying information for display in the term info window. This is in contrast to other flat files that only contain a simple list of terms.

obo files for the phenotype OA

The following fields use an obo file, the name, source and script that generates the obo file used is noted.

  • Pub field -> paper.obo
  • Person field
  • Variation ->obo_oa_ontology
  • Transgene ->obo_oa_ontology
  • Rearrangement ->obo_oa_ontology
  • Caused by -> WBGene
  • Phenotype ->phenotype.obo
  • Molecule ->molecule.obo
  • Anatomy ->WBbt.obo
  • Life stage ->worm_development.obo
  • Child of ->phenotype.obo
  • Laboratory evidence
  • Entity ->chebi.obo, rex.obo, gene_ontology_ext.obo
  • Quality ->quality.obo


obo OA ontologies

This script : /home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl

Is on a cron job every day at 3am. It populates obo tables by downloading :

http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/phenotype_ontology_obo.cgi ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo http://obo.cvs.sourceforge.net/*checkout*/obo/obo/ontology/anatomy/gross_anatomy/animal_gross_anatomy/worm/worm_anatomy/WBbt.obo http://www.berkeleybop.org/ontologies/obo-all/worm_development/worm_development.obo http://www.berkeleybop.org/ontologies/obo-all/rex/rex.obo http://www.berkeleybop.org/ontologies/obo-all/quality/quality.obo (do we even use entity / quality in the phenotype OA ?) And it also calls :

http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo

AQL Queries for updating the variation obo

Instructions for retrieving object connections from the latest WS build

Variation-gene, variation-paper connections are used to filter out the variation term drop down list to keep the size of the autocomplete file manageable. The total variation object class is filtered for those variations that are called 'alleles' or have a gene connection (which includes transposon variations). The files used for filtering this list need to be updated with each new release. To update this info, the following AQL queries need to be performed on the newest release and deposited on tazendra for the updating scripts.


Variation_gene connections

Find all variations in the allele group (excludes SNPs etc.) along with the WBGeneID and public gene name of the gene they are assigned, if that is available. You will be making a file named Variation_gene.txt' that is a combination of vargene.txt and transposons.txt

  • vargene.txt
select a, a->gene, a->gene->public_name, a->reference from a in class variation where exists_tag a->allele 
Export as vargene.txt  (choose Separator character set to blank (TAB))
  • transposons.txt
select t, t->gene, t->gene->public_name, t>reference from t in class variation where exists_tag t->transposon_insertion and exists t->gene
Export as above as transposons.txt
  • Make Variation_gene.txt by copying and pasting transposons.txt to the end of vargene.txt and saving as Variation_gene.txt"
  • total_variations.txt
select v, v->gene, v->gene->public_name, v->reference from v in class variation
Export as total_variations.txt as above

//This is required for building an exclusion list that filters out SNPs, and is referred to as a junk list

Transgene_summary_paper connections

List transgenes already linked to a paper

  • transpapsum.txt
select t, t->reference, t->summary from t in class transgene where exists t->reference
Export as transpapsum.txt  

Rearrangement_inside_gene connections

List rearrangements with LG, 'genes inside' and ‘gene outside’ (public names only)

  • rearragene.txt
select r, r->map, r->gene_inside->public_name, r->gene_outside->public_name from r in class rearrangement
Export as rearragene.txt 

Repopulating .obo's

Two scripts run off of these files to update the .obo's for the OA. The scripts are on tazendra and run off of files Variation_gene.txt, transgene_summary_reference.txt and rearr_simple.txt. So files need to be transferred to tazendra and renamed to be recognizable by those scripts.

Transfer files to tazendra

scp all files to acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries


Scripts for repopulating the OA obo files


  • Cron job: populate_newobjects_cgi_postgres_tables.pl updates information based on Variation_gene.txt and transgene_summary.txt. This script is required for posting new allele and transgene entries on to the New objects cgi and sending notifications to the relevant curators. (Make sure files are named accordingly or the program won’t see them).



To check if the re-population scripts worked, check out the WS_current info field The date will tell you when it was last updated; it should reflect the date the script was run.


back

--kjy