Difference between revisions of "Updating ontology (.obo) files for the OA"

From WormBaseWiki
Jump to navigationJump to search
m
 
(43 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This SOP is for updating the Object names obo files in the OA
+
This obo file is used to display term info for variations, transgenes, strains, clones, rearrangements, genes in any OA interface that contains these objects. It needs to be updated with every release of acedb.  The obo file is created from AQL queries of the latest WS.
 +
 +
In the Phenotype OA, all object fields, except strain should be autocomplete drop down lists.  The files that are used to populate these fields are an obo-like format in that there is information attached to each object that shows up in the term info box when selected.  Keeping the file updated from acedb and showing this information in the term info box helps during curation as it verifies the identity of the object being curated and saves the curator time from having to manually look up and verify the info themselves.  These files although not technically 'obo files' will be referred to as obo files when referring to any flat file that contains a list of terms with accompanying information for display in the term info window.  This is in contrast to other flat files that only contain a simple list of terms.
  
 +
====obo files for the phenotype OA====
 +
The following fields use an obo file, the name, source and script that generates the obo file used is noted.
 +
*Pub field -> paper.obo
 +
*Person field
 +
*Variation ->obo_name_variation; obo_data_variation <br>
 +
*Transgene ->trp tables
 +
*Rearrangement ->obo_oa_ontology
 +
*Caused by -> WBGene
 +
*Phenotype ->phenotype.obo
 +
*Molecule ->molecule.obo
 +
*Anatomy ->WBbt.obo
 +
*Life stage ->worm_development.obo
 +
*Child of ->phenotype.obo
 +
*Laboratory evidence
 +
*Entity ->chebi.obo, rex.obo, gene_ontology_ext.obo
 +
*Quality ->quality.obo
  
= Updating local acedb to latest available WS (instructions from Wen) =
+
===obo OA ontologies===
 +
See Ontology update pipeline [http://wiki.wormbase.org/index.php/Source_and_maintenance_of_non-WBGene_info here] for variation, clone, strain, rearrangement, and laboratory obo table, basically these files are updated from a nightly geneace dump.<br>
 +
Transgenes should be updated directly from the trp tables.<br>
 +
Variations are also updated from the
  
You have to download the latest build from the Sanger website. From the command line (X11 on Mac OS), go to the local directory where the old WS is installed. Login anonymously to Sanger’s ftp site
+
All other ontologies in the phenotype OA should be updated through the following script:
 +
/home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl
 +
obo tables are populated by downloading :
  
bash-3.2$ ftp ftp.sanger.ac.uk
+
Phenotype term: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/phenotype_ontology_obo.cgi<br>
Connected to ftpservice2.sanger.ac.uk.
+
Molecule ChEBI: ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo<br>
220-ftp.sanger.ac.uk NcFTPd Server (free educational license) ready.
+
Gene ontology: http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo<br>
220-Wellcome Trust Sanger Institute FTP server
+
Anatomy: http://obo.cvs.sourceforge.net/*checkout*/obo/obo/ontology/anatomy/gross_anatomy/animal_gross_anatomy/worm/worm_anatomy/WBbt.obo<br>
220-
+
Life stage: http://www.berkeleybop.org/ontologies/obo-all/worm_development/worm_development.obo<br>
220-Problems after login? Try using '-' as the first character of you
+
Pathogen: h_pap_species_index
220-password.
+
PATO entity: http://www.berkeleybop.org/ontologies/obo-all/rex/rex.obo<br>
220-
+
PATO quality: http://www.berkeleybop.org/ontologies/obo-all/quality/quality.obo<br>
220-****
+
NOTE: Across all ontologies all "is_obsolete: true" are changed to be red.<br>
220-****
 
220-**** 7/9/06 FTP Server upgraded please report any problems to
 
220-****    ftpadmin@sanger.ac.uk
 
220-****
 
220
 
Name (ftp.sanger.ac.uk:Yook): anonymous
 
331 Guest login ok, send your complete e-mail address as password.
 
Password:  
 
  
Go to directory containing WS releases Download whole release (takes about 1 hour) Quit ftp
 
  
  FTP&gt; cd pub/wormbase
+
Everyday this script calls and incremental update from the variation nameserver:<br>
  FTP&gt; get WS188.tar
+
  http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo
[or get –R WS188 for Ncfp client]
+
  Note: this was changed from http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo
FTP&gt; bye
 
  
Unzip tar Get into the new WS directory and run install to install new database (~15 minutes)
+
On the 1st of the month the script calls a full variation obo update:<br>
 +
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=UpdateVariationObo
  
$ tar –xvvf WS188.tar
 
$ cd WS188
 
$./INSTALL
 
  
The readout after installation is as follows:  
+
--[[User:Kyook|kjy]] 22:33, 4 September 2013 (UTC)
  
ACEDB installation script:
+
===Repopulating variation OA obo from geneace===
Yook will be known as the acedb-administrator
+
For each variation with specific Method (listed in table below), the following information will be retrieved:
We are going to install the acedb system in the present directory:
+
*WBVar ID
    /Users/Yook/WS_latest/WS188
+
*public_name
This is your available disk space in this directory:
+
*gene association
Filesystem  1024-blocks      Used Available Capacity  Mounted on
+
*references
/dev/disk0s2  488050672 149925748 337868924   31%   /
+
*method
The amount of space you need will depend on what data you are installing.
+
*status
For the source code and binary, you need around 15 Mb.
+
<pre>
Should we proceed?  Please answer yes/no&nbsp;: yes
+
Only variations with one of the following attached methods are retrieved.
 +
  Allele
 +
  Deletion_allele
 +
  Deletion_and_insertion_allele
 +
  Deletion_polymorphism
 +
  Insertion_allele
 +
  Insertion_polymorhism
 +
  KO_consortium_allele
 +
  Mos_insertion
 +
  NBP_knockout_allele
 +
   NemaGENETAG_consortium_allele
 +
   Substitution_allele
 +
  Transposon_insertion
 +
  Engineered_allele
 +
</pre>
  
Exchange newest release with older one by removing old release and or change ./xace launch path to new release etc. $ rm &lt;old WS release&gt;
+
These data will populate /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace files <br>
 +
obo_name_variation<br>
 +
obo_data_variation <br>
  
=Updating .obo files=
+
in obo_name_variation, entries are like:
 +
<pre>
 +
WBVar00000020 ad487 2015-11-30 20:01:08
 +
</pre>
 +
in obo_data_variation, entries are like:
 +
<pre>
 +
WBVar00088136 id: WBVar00088136\nname: "ju2"\nspecies: "Caenorhabditis elegans"\nstatus: "Live"\ngene: "WBGene00006363 syd-1"\nreference: "WBPaper00005543" 2015-11-30 20:01:08
 +
</pre>
  
Instructions for retrieving object connections from the latest WS build
+
If a variation does not exist in the geneace dump, and hence not in obo_name/data_variation tables
 +
* retrieve a WBVarID from the variation nameserver at http://www.sanger.ac.uk/sanger/Worm_NameServer (you will need a login and password, which may take a while to be assigned)
 +
* enter the public name and WBVarId, separated by a space OR tab, into the TempVariationObo http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo  * * note: The form can take columns of data as long as it is in the format of <allele public name> <WBVarID>. 
 +
The information is added immediately to the obo_name_variation and should be available through the OA variation field (a form reload may be necessary).
  
Variation-gene and variation-paper connection information retrieved in phenote depends on information from the latest WS (install latest release).
+
If the allele already has a WBVarID but does not exist in the nightly geneace dump, curators should still enter the object through the generic.cgi. (http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo) <br>
When the new release is ready these .obo files in tazendra need to be repopulated with the most current information.
+
When the object comes through during the geneace dump, its information will be captured and overwritten in obo_data_variation. <br>
 +
For variations that already exist in geneace but do not come through the geneace dump, e.g., million mutation varations, etc., they will remain in obo_data_variation and still be available through the drop down.  
  
Run queries on the latest WS release for the most current information.
 
  
 +
If a variation needs correcting, go to /home/azurebrd/public_html/cgi-bin/data/obo_tempfile_variation and you can manually edit the file.
 +
--[[User:Kyook|Kyook]] ([[User talk:Kyook|talk]]) 20:01, 29 September 2017 (UTC)
  
----
+
==Obsolete==
  
==AQL Queries==
+
''Obsolete-> To check if the re-population scripts worked, check out the [http://tazendra.caltech.edu/~azurebrd/var/work/phenote/ws_current.obo WS_current] info field. The date will tell you when it was last updated; it should reflect the date the script was run. ''
  
===Variation_gene connections===
+
<br>
''Find all variations in the allele group (excludes SNPs etc.) along with the WBGeneID and public gene name of the gene they are assigned, if that is available.''
+
''Obsolete ->
You will be making a file named '''WS200_vargene.txt''' that is a combination of ''WS200_vargene0.txt'' and ''WS200_transposons.txt''
+
Updating variation obo
 +
This script looks at the files in:<br>
 +
home/acedb/jolene/WS_AQL_queries
 +
  Variation_gene.txt
 +
  total_variations.txt
 +
  clone_info.txt
 +
  strains.txt
 +
  rearr_simple.txt
 +
  expr_cluster.txt
 +
  
* ''WS200_vargene.txt''
+
  and the new Gene Class file to
select g, g->gene, g->gene->public_name, g->reference from g in class variation where exists_tag g->allele
+
populate the obo_*_geneclass tables.
Export as WS200_vargene.txt to your desktop (choose Separator character set to blank (TAB))
 
* ''WS200_transposons.txt''
 
  select v, v->gene, v->gene->public_name, v->reference from v in class variation where exists_tag v->transposon_insertion and exists v->gene
 
Export as WS200_transposons.txt to your desktop (choose Separator character set to blank (TAB))
 
* Make '''WS200_vargene.txt''' by copying and pasting ''WS200_transposons.txt'' to the end of ''WS200_vargene.txt'' and saving as ''WS200_vargene.txt''
 
* ''total_variations.txt''
 
select v, v->gene, v->gene->public_name, v->reference from v in class variation
 
Export as total_variations.txt to your desktop (choose Separator character set to blank (TAB))
 
//This is required for building an exclusion list that filters out SNPs
 
  
===Transgene_summary_paper connections===
+
To launch the full update manually log in to tazendra as acedb and do
''List transgenes already linked to a paper''
+
wget "http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=UpdateVariationObo"
* '''WS200_transpapsum.txt'''
 
select t, t->reference, t->summary from t in class transgene where exists t->reference
 
Export as WS200_transpapsum.txt to your desktop (choose Separator character set to blank (TAB))
 
  
===Rearrangement_inside_gene connections===
+
Because it takes so long, make sure to wait until people do not require the ontologies to work in the OA.
''List rearrangements with LG, 'genes inside' and ‘gene outside’ (public names only)''
 
* '''WS200_rearragene.txt'''
 
select r, r->map, r->gene_inside->public_name, r->gene_outside->public_name from r in class rearrangement
 
Export as WS200_rearragene.txt your desktop (choose Separator character set to blank (TAB))
 
  
 +
Two scripts run off of these files to update the .obo's for the OA.  The scripts are on tazendra and run off of files '''Variation_gene.txt''', ''(transgene_summary_reference.txt-obsolete)'' and '''rearr_simple.txt'''. So files need to be transferred to tazendra and renamed to be recognizable by those scripts. Transfer files to tazendra: scp all files to acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries''
  
----
+
''Obsolete ->Cron job: populate_newobjects_cgi_postgres_tables.pl  updates information based on Variation_gene.txt (and transgene_summary.txt). This script is required for posting new allele or transgene entries on to the New objects cgi and sending notifications to the relevant curators. (Make sure files are named accordingly or the program won’t see them).''
 +
 +
''Obsolete ->http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo''
  
==Repopulating .obo's==
+
===AQL Queries for updating the variation obo===
 +
Instructions for retrieving object connections from the latest WS build
 +
Grab latest build from spica.
  
Two scripts need to be run to update the .obo's for phenote. It is important to run these scripts every time the variation information is updated. The scripts are on tazendra and run off of files '''Variation_gene.txt''', '''transgene_summary_reference.txt''' and '''rearr_simple.txt'''. So files need to be transferred to tazendra and renamed to be recognizable by those scripts.  
+
Variation-gene, variation-paper connections are used to select for variations of type allele or those that affect CDS (can include transposon and polymorphism alleles). This is necessary to keep the size of the variation autocomplete file manageable and not slow down the loading and using of the OA.
  
===Transfer files to tazendra===
+
===Variation_gene connections===
From within the directory that contains the files you just downloaded
+
''Find all variations in the allele group (excludes SNPs etc.) along with the WBGeneID and public gene name of the gene they are assigned, if that is available.''
send files to ''acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries''
+
You will be making a file named ''Variation_gene.txt''' that is a combination of ''vargene.txt'' and ''transposons.txt''
  
*Send and '''Rename''' ''WS200_vargene.txt'' to ''Variation_gene.txt''
+
* ''vargene.txt''  
  $ scp WS200_vargene.txt acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries/Variation_gene.txt  
+
  select a, a->gene, a->gene->public_name, a->reference from a in class variation where exists_tag a->allele
 +
Export as vargene.txt (choose Separator character set to blank (TAB))
  
*Send and '''Rename''' ''WS200_rearragene.txt'' to ''rearr_simple.txt''
+
* ''transposons.txt''  
  $ scp WS200_rearragene.txt acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries/rearr_simple.txt  
+
  select t, t->gene, t->gene->public_name, t>reference from t in class variation where exists_tag t->transposon_insertion and exists t->gene
 +
Export as above as transposons.txt
  
*Send and '''Rename''' ''WS200_transpapsum'' to ''transgene_summary_reference.txt''
+
* Make '''Variation_gene.txt''' by copying and pasting ''transposons.txt'' to the end of ''vargene.txt'' and saving as ''Variation_gene.txt"
$ scp WS200_transpapsum.txt acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries/transgene_summary.txt  
 
  
  $ scp total_variations.txt acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries/total_variations.txt
+
* ''total_variations.txt''
 +
select a from a in class variation where exists a->variation_type\\
 +
  Export as total_variations.txt as above//This is required for building an exclusion list that filters out SNPs, and is referred to as a junk list
  
===Scripts to repopulate Phenote .obo files===  
+
===Transgene_summary_paper connections===
*''populate_newobjects_cgi_postgres_tables.pl''  updates information based on Variation_gene.txt and transgene_summary.txt. This script is required for posting new allele and transgene entries on to the New objects cgi and sending notifications to the relevant curators. (Make sure files are named accordingly or the program won’t see them).
+
''List transgenes already linked to a paper''
 +
* '''transgene_summary.txt'''  
 +
select t, t->reference, t->summary from t in class transgene where exists t->reference
 +
  Export as transgene_summary.txt
  
Note: As of 6/10/2010, we have discontinued the use of the 'New Variation!' object cgi page
+
===Rearrangement_inside_gene connections===
 +
''List rearrangements with LG, 'genes inside' and ‘gene outside’ (public names only)''
 +
* '''rearr_simple.txt'''
 +
select r, r->map, r->gene_inside->public_name, r->gene_outside->public_name from r in class rearrangement\\
 +
Export as rearr_simple.txt
  
*''make_obo.pl'' creates a text .obo based on rearr_simple.txt and Variation_gene.txt. This script populates the WS current info (which is needed for the Term info Display in the OA).
+
===Strain info===
 +
Added 5.17.11 for Daniela
 +
*'''strains.txt'''
 +
select s, s->genotype, s->location from s in class strain
  
NOTE: Most likely going to be obsolete after the 'Update' button gets inserted into the phenotype OA
+
===Clone info===
 
+
Added 5.17.11 for Daniela
Both apps are on tazendra in the same directory as the updated variation info.
+
*'''clone_info.txt'''
 +
select a, a->Type, a->transgene, a->strain, a->general_remark, a->location, a->accession_number, a->reference from a in class clone
  
cd to /home/acedb/jolene/WS_AQL_queries/
 
$ ./populate_newobjects_cgi_postgres_tables.pl
 
$ ./make_obo.pl
 
  
 +
----
  
-Jolene
 
 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
NOTE: populate_gin_variation updates data based on variation_tab_wbgene
 
file (in postgres / cgi) , which is no longer current.
 
 
To check if the re-population scripts worked, check out the [http://tazendra.caltech.edu/~azurebrd/var/work/phenote/ws_current.obo WS_current] info field
 
The date will tell you when it was last updated; it should reflect the date the script was run.
 
  
<br>
 
  
 
[http://www.wormbase.org/wiki/index.php/Caltech_documentation ''back'']  
 
[http://www.wormbase.org/wiki/index.php/Caltech_documentation ''back'']  
Line 150: Line 190:
  
 
[[Category:Phenotype Curation]]
 
[[Category:Phenotype Curation]]
 +
[[Category:Phenotype]]
 +
 +
 +
 +
/home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl

Latest revision as of 18:48, 13 August 2020

This obo file is used to display term info for variations, transgenes, strains, clones, rearrangements, genes in any OA interface that contains these objects. It needs to be updated with every release of acedb. The obo file is created from AQL queries of the latest WS.

In the Phenotype OA, all object fields, except strain should be autocomplete drop down lists. The files that are used to populate these fields are an obo-like format in that there is information attached to each object that shows up in the term info box when selected. Keeping the file updated from acedb and showing this information in the term info box helps during curation as it verifies the identity of the object being curated and saves the curator time from having to manually look up and verify the info themselves. These files although not technically 'obo files' will be referred to as obo files when referring to any flat file that contains a list of terms with accompanying information for display in the term info window. This is in contrast to other flat files that only contain a simple list of terms.

obo files for the phenotype OA

The following fields use an obo file, the name, source and script that generates the obo file used is noted.

  • Pub field -> paper.obo
  • Person field
  • Variation ->obo_name_variation; obo_data_variation
  • Transgene ->trp tables
  • Rearrangement ->obo_oa_ontology
  • Caused by -> WBGene
  • Phenotype ->phenotype.obo
  • Molecule ->molecule.obo
  • Anatomy ->WBbt.obo
  • Life stage ->worm_development.obo
  • Child of ->phenotype.obo
  • Laboratory evidence
  • Entity ->chebi.obo, rex.obo, gene_ontology_ext.obo
  • Quality ->quality.obo

obo OA ontologies

See Ontology update pipeline here for variation, clone, strain, rearrangement, and laboratory obo table, basically these files are updated from a nightly geneace dump.
Transgenes should be updated directly from the trp tables.
Variations are also updated from the

All other ontologies in the phenotype OA should be updated through the following script:

/home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl

obo tables are populated by downloading :

Phenotype term: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/phenotype_ontology_obo.cgi
Molecule ChEBI: ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo
Gene ontology: http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo
Anatomy: http://obo.cvs.sourceforge.net/*checkout*/obo/obo/ontology/anatomy/gross_anatomy/animal_gross_anatomy/worm/worm_anatomy/WBbt.obo
Life stage: http://www.berkeleybop.org/ontologies/obo-all/worm_development/worm_development.obo
Pathogen: h_pap_species_index PATO entity: http://www.berkeleybop.org/ontologies/obo-all/rex/rex.obo
PATO quality: http://www.berkeleybop.org/ontologies/obo-all/quality/quality.obo
NOTE: Across all ontologies all "is_obsolete: true" are changed to be red.


Everyday this script calls and incremental update from the variation nameserver:

http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo 
Note: this was changed from http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo

On the 1st of the month the script calls a full variation obo update:

http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=UpdateVariationObo


--kjy 22:33, 4 September 2013 (UTC)

Repopulating variation OA obo from geneace

For each variation with specific Method (listed in table below), the following information will be retrieved:

  • WBVar ID
  • public_name
  • gene association
  • references
  • method
  • status
Only variations with one of the following attached methods are retrieved. 
   Allele
   Deletion_allele
   Deletion_and_insertion_allele
   Deletion_polymorphism
   Insertion_allele
   Insertion_polymorhism
   KO_consortium_allele
   Mos_insertion
   NBP_knockout_allele
   NemaGENETAG_consortium_allele
   Substitution_allele
   Transposon_insertion
   Engineered_allele

These data will populate /home/postgres/work/pgpopulation/obo_oa_ontologies/geneace files
obo_name_variation
obo_data_variation

in obo_name_variation, entries are like:

WBVar00000020	ad487	2015-11-30 20:01:08

in obo_data_variation, entries are like:

WBVar00088136	id: WBVar00088136\nname: "ju2"\nspecies: "Caenorhabditis elegans"\nstatus: "Live"\ngene: "WBGene00006363 syd-1"\nreference: "WBPaper00005543"	2015-11-30 20:01:08

If a variation does not exist in the geneace dump, and hence not in obo_name/data_variation tables

The information is added immediately to the obo_name_variation and should be available through the OA variation field (a form reload may be necessary).

If the allele already has a WBVarID but does not exist in the nightly geneace dump, curators should still enter the object through the generic.cgi. (http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo)
When the object comes through during the geneace dump, its information will be captured and overwritten in obo_data_variation.
For variations that already exist in geneace but do not come through the geneace dump, e.g., million mutation varations, etc., they will remain in obo_data_variation and still be available through the drop down.


If a variation needs correcting, go to /home/azurebrd/public_html/cgi-bin/data/obo_tempfile_variation and you can manually edit the file. --Kyook (talk) 20:01, 29 September 2017 (UTC)

Obsolete

Obsolete-> To check if the re-population scripts worked, check out the WS_current info field. The date will tell you when it was last updated; it should reflect the date the script was run.


Obsolete -> Updating variation obo This script looks at the files in:

home/acedb/jolene/WS_AQL_queries
  Variation_gene.txt
  total_variations.txt
  clone_info.txt
  strains.txt
  rearr_simple.txt
  expr_cluster.txt

and the new Gene Class file to

populate the obo_*_geneclass tables.

To launch the full update manually log in to tazendra as acedb and do

wget "http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=UpdateVariationObo"

Because it takes so long, make sure to wait until people do not require the ontologies to work in the OA.

Two scripts run off of these files to update the .obo's for the OA. The scripts are on tazendra and run off of files Variation_gene.txt, (transgene_summary_reference.txt-obsolete) and rearr_simple.txt. So files need to be transferred to tazendra and renamed to be recognizable by those scripts. Transfer files to tazendra: scp all files to acedb@tazendra.caltech.edu:/home/acedb/jolene/WS_AQL_queries

Obsolete ->Cron job: populate_newobjects_cgi_postgres_tables.pl updates information based on Variation_gene.txt (and transgene_summary.txt). This script is required for posting new allele or transgene entries on to the New objects cgi and sending notifications to the relevant curators. (Make sure files are named accordingly or the program won’t see them).

Obsolete ->http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=AddToVariationObo

AQL Queries for updating the variation obo

Instructions for retrieving object connections from the latest WS build Grab latest build from spica.

Variation-gene, variation-paper connections are used to select for variations of type allele or those that affect CDS (can include transposon and polymorphism alleles). This is necessary to keep the size of the variation autocomplete file manageable and not slow down the loading and using of the OA.

Variation_gene connections

Find all variations in the allele group (excludes SNPs etc.) along with the WBGeneID and public gene name of the gene they are assigned, if that is available. You will be making a file named Variation_gene.txt' that is a combination of vargene.txt and transposons.txt

  • vargene.txt
select a, a->gene, a->gene->public_name, a->reference from a in class variation where exists_tag a->allele 
Export as vargene.txt  (choose Separator character set to blank (TAB))
  • transposons.txt
select t, t->gene, t->gene->public_name, t>reference from t in class variation where exists_tag t->transposon_insertion and exists t->gene
Export as above as transposons.txt
  • Make Variation_gene.txt by copying and pasting transposons.txt to the end of vargene.txt and saving as Variation_gene.txt"
  • total_variations.txt
select a from a in class variation where exists a->variation_type\\
Export as total_variations.txt as above//This is required for building an exclusion list that filters out SNPs, and is referred to as a junk list

Transgene_summary_paper connections

List transgenes already linked to a paper

  • transgene_summary.txt
select t, t->reference, t->summary from t in class transgene where exists t->reference
Export as transgene_summary.txt

Rearrangement_inside_gene connections

List rearrangements with LG, 'genes inside' and ‘gene outside’ (public names only)

  • rearr_simple.txt
select r, r->map, r->gene_inside->public_name, r->gene_outside->public_name from r in class rearrangement\\
Export as rearr_simple.txt 

Strain info

Added 5.17.11 for Daniela

  • strains.txt
select s, s->genotype, s->location from s in class strain

Clone info

Added 5.17.11 for Daniela

  • clone_info.txt
select a, a->Type, a->transgene, a->strain, a->general_remark, a->location, a->accession_number, a->reference from a in class clone




back

--kjy


/home/postgres/work/pgpopulation/obo_oa_ontologies/update_obo_oa_ontologies.pl