Difference between revisions of "Updating ontology (.obo) files for the OA"

From WormBaseWiki
Jump to navigationJump to search
m
m
Line 1: Line 1:
=Updating local acedb to latest available WS (instructions from Wen)=
+
= Updating local acedb to latest available WS (instructions from Wen) =
 +
 
 +
You have to download the latest build from the Sanger website. From the command line (X11 on Mac OS), go to the local directory where the old WS is installed. Login anonymously to Sanger’s ftp site
  
You have to download the latest build from the Sanger website.
 
From the command line (X11 on Mac OS), go to the local directory where the old WS is installed.
 
Login anonymously to Sanger’s ftp site
 
 
  bash-3.2$ ftp ftp.sanger.ac.uk
 
  bash-3.2$ ftp ftp.sanger.ac.uk
Connected to ftpservice2.sanger.ac.uk.
+
Connected to ftpservice2.sanger.ac.uk.
220-ftp.sanger.ac.uk NcFTPd Server (free educational license) ready.
+
220-ftp.sanger.ac.uk NcFTPd Server (free educational license) ready.
220-Wellcome Trust Sanger Institute FTP server
+
220-Wellcome Trust Sanger Institute FTP server
220-
+
220-
220-Problems after login? Try using '-' as the first character of you
+
220-Problems after login? Try using '-' as the first character of you
220-password.
+
220-password.
220-
+
220-
220-****
+
220-****
220-****
+
220-****
220-**** 7/9/06 FTP Server upgraded please report any problems to
+
220-**** 7/9/06 FTP Server upgraded please report any problems to
220-****    ftpadmin@sanger.ac.uk
+
220-****    ftpadmin@sanger.ac.uk
220-****
+
220-****
220  
+
220  
Name (ftp.sanger.ac.uk:Yook): anonymous
+
Name (ftp.sanger.ac.uk:Yook): anonymous
331 Guest login ok, send your complete e-mail address as password.
+
331 Guest login ok, send your complete e-mail address as password.
Password:  
+
Password:  
 +
 
 +
Go to directory containing WS releases Download whole release (takes about 1 hour) Quit ftp
 +
 
 +
FTP> cd pub/wormbase
 +
FTP> get WS188.tar
 +
[or get –R WS188 for Ncfp client]
 +
FTP> bye
  
Go to directory containing WS releases
+
Unzip tar Get into the new WS directory and run install to install new database (~15 minutes)  
Download whole release (takes about 1 hour)
 
Quit ftp
 
FTP> cd pub/wormbase
 
FTP> get WS188.tar
 
  [or get –R WS188 for Ncfp client]
 
FTP> bye
 
  
Unzip tar
 
Get into the new WS directory and run install to install new database (~15 minutes)
 
 
  $ tar –xvvf WS188.tar
 
  $ tar –xvvf WS188.tar
$ cd WS188
+
$ cd WS188
$./INSTALL
+
$./INSTALL
 +
 
 +
The readout after installation is as follows:
  
The readout after installation is as follows:
 
 
  ACEDB installation script:
 
  ACEDB installation script:
Yook will be known as the acedb-administrator
+
Yook will be known as the acedb-administrator
We are going to install the acedb system in the present directory:  
+
We are going to install the acedb system in the present directory:  
      /Users/Yook/WS_latest/WS188
+
    /Users/Yook/WS_latest/WS188
This is your available disk space in this directory:  
+
This is your available disk space in this directory:  
Filesystem  1024-blocks      Used Available Capacity  Mounted on
+
Filesystem  1024-blocks      Used Available Capacity  Mounted on
/dev/disk0s2  488050672 149925748 337868924    31%    /
+
/dev/disk0s2  488050672 149925748 337868924    31%    /
The amount of space you need will depend on what data you are installing.
+
The amount of space you need will depend on what data you are installing.
For the source code and binary, you need around 15 Mb.
+
For the source code and binary, you need around 15 Mb.
Should we proceed?  Please answer yes/no : yes
+
Should we proceed?  Please answer yes/no : yes
 +
 
 +
Exchange newest release with older one by removing old release and or change ./xace launch path to new release etc. $ rm <old WS release>
  
Exchange newest release with older one by removing old release and or change ./xace launch path to new release etc.
+
= Updating .obo files =
$ rm <old WS release>
 
  
=Updating .obo files=
+
== Retrieving object connections from the latest build files ==
  
==Retrieving object connections from the latest build files==
+
Variation-gene and variation-paper connection information retrieved in phenote depends on information from the latest WS. After the upload, these .obo files in Postgres need to be repopulated with the most current information.
  
Variation-gene and variation-paper connection information retrieved in phenote depends on information from the latest WS
+
Run queries on the latest WS release for the most current information (it probably matters that the queries maintain the same column output?)
After the upload, these .obo files in Postgres need to be repopulated with the most current information
 
  
Run queries on the latest WS release for the most current information (it probably matters that the queries maintain the same column output?)
+
=== Allele_gene connections ===
  
===Allele_gene connections===
+
To: ''Find all variations in the allele group (excludes SNPs etc.) along with the WBGeneID and public gene name of the gene they are assigned, if that is available.''
  
To:
 
''Find all variations in the allele group (excludes SNPs etc.) along with the WBGeneID and public gene name of the gene they are assigned, if that is available.''
 
 
  Run the following AQL:   
 
  Run the following AQL:   
'''select a, a->gene, a->gene->public_name from a in class variation where exists_tag a->allele'''  
+
'''select a, a-&gt;gene, a-&gt;gene-&gt;public_name from a in class variation where exists_tag a-&gt;allele'''  
 +
 
 
:Sort results (optional)  
 
:Sort results (optional)  
:Export as WS#_vargene into /Users/Yook/Postgres on scoobydoo (choose Separator character set to blank (TAB))
+
:Export as WS#_vargene into /Users/Yook/Postgres on scoobydoo (choose Separator character set to blank (TAB))  
:As of WS189 this query returns 20653 lines, and includes all alleles and in some cases, multiple genes linked to an allele (e.g.isoforms, alt gene models etc.)
+
:As of WS189 this query returns 20653 lines, and includes all alleles and in some cases, multiple genes linked to an allele (e.g.isoforms, alt gene models etc.)
 +
 
 +
=== Transgene_summary_paper connections ===
 +
 
 +
To: ''List transgenes already linked to a paper''
  
===Transgene_summary_paper connections===
 
To:
 
''List transgenes already linked to a paper''
 
 
  Run the following AQL:  
 
  Run the following AQL:  
'''select t, t->reference, t->summary from t in class transgene where exists t->reference'''
+
'''select t, t-&gt;reference, t-&gt;summary from t in class transgene where exists t-&gt;reference'''
 +
 
 
:Sort results (optional)  
 
:Sort results (optional)  
:Export as WS#_transpapsum into /Users/Yook/Postgres on scoobydoo (choose Separator character set to blank (TAB))
+
:Export as WS#_transpapsum into /Users/Yook/Postgres on scoobydoo (choose Separator character set to blank (TAB))  
 
:As of WS189 this query returns 5473 lines
 
:As of WS189 this query returns 5473 lines
  
===Rearrangement_inside_gene connections===
+
=== Rearrangement_inside_gene connections ===
To:
+
 
''List rearrangements with LG, 'genes inside' and ‘gene outside’ (public names only)''
+
To: ''List rearrangements with LG, 'genes inside' and ‘gene outside’ (public names only)''  
 +
 
 
  Run the following AQL:  
 
  Run the following AQL:  
'''select r, r->map, r->gene_inside->public_name, r->gene_outside->public_name from r in class rearrangement'''
+
'''select r, r-&gt;map, r-&gt;gene_inside-&gt;public_name, r-&gt;gene_outside-&gt;public_name from r in class rearrangement'''
 +
 
 
:Sort results (optional)  
 
:Sort results (optional)  
:Export as WS#_rearragene into /Users/Yook/Postgres on scoobydoo (choose Separator character set to blank (TAB))
+
:Export as WS#_rearragene into /Users/Yook/Postgres on scoobydoo (choose Separator character set to blank (TAB))  
 
:As of WS189 this query returns 10054 lines
 
:As of WS189 this query returns 10054 lines
  
==Repopulating Postgres==
+
== Repopulating Postgres ==
 +
 
 
Two scripts need to be run to populate Postgres with the updated variation information. It is important to run these scripts every time the variation information is updated. The scripts are on tazendra and run off of files '''Variation_gene.txt''', '''transgene_summary_reference.txt''' and '''rearr_simple.txt'''. So files need to be transferred to tazendra and renamed to be recognizable by those scripts.  
 
Two scripts need to be run to populate Postgres with the updated variation information. It is important to run these scripts every time the variation information is updated. The scripts are on tazendra and run off of files '''Variation_gene.txt''', '''transgene_summary_reference.txt''' and '''rearr_simple.txt'''. So files need to be transferred to tazendra and renamed to be recognizable by those scripts.  
  
=== Transfer files to tazendra ===
+
=== Transfer files to tazendra ===
  
 
Scripts are stored in acedb@tazendra.caltech.edu:/home/acedb/karen/populate_gin_variation  
 
Scripts are stored in acedb@tazendra.caltech.edu:/home/acedb/karen/populate_gin_variation  
Line 107: Line 110:
 
  $ scp WS191_transpapsum.txt acedb@tazendra.caltech.edu:/home/acedb/karen/populate_gin_variation/transgene_summary_reference.txt
 
  $ scp WS191_transpapsum.txt acedb@tazendra.caltech.edu:/home/acedb/karen/populate_gin_variation/transgene_summary_reference.txt
  
===Scripts to populate Phenote .obo files===  
+
=== Scripts to populate Phenote .obo files ===
:'''''populate_ali_alleleinfo.pl''''' updates information based on Variation_gene.txt and transgene_summary_reference.txt. Make sure files are named accordingly or the program won’t see them.  
+
 
 +
:'''''populate_ali_alleleinfo.pl''''' updates information based on Variation_gene.txt and transgene_summary_reference.txt. Make sure files are named accordingly or the program won’t see them. DO NOT run populate_gin_variation.pl.
 
:'''''make_obo.pl''''' creates a text .obo based on rearr_simple.txt and Variation_gene.txt. This script populates the WS current info.
 
:'''''make_obo.pl''''' creates a text .obo based on rearr_simple.txt and Variation_gene.txt. This script populates the WS current info.
+
 
 
Both apps are on tazendra in the same directory as the updated variation info.  
 
Both apps are on tazendra in the same directory as the updated variation info.  
  
 
  cd to /home/acedb/karen/populate_gin_variation/
 
  cd to /home/acedb/karen/populate_gin_variation/
$ ./populate_ali_alleleinfo.pl
+
$ ./populate_ali_alleleinfo.pl
$ ./ make_obo.pl
+
$ ./ make_obo.pl
 
 
NOTE: populate_gin_variation updates data based on variation_tab_wbgene
 
file (in postgres / cgi) , which is no longer current.
 
  
To check if the re-population scripts worked, check out the [http://tazendra.caltech.edu/~azurebrd/var/work/phenote/ws_current.obo WS_current] info field
+
NOTE: populate_gin_variation updates data based on variation_tab_wbgene file (in postgres / cgi) , which is no longer current.  
The date will tell you when it was last updated; it should reflect the date the script was run.  
 
  
 +
To check if the re-population scripts worked, check out the [http://tazendra.caltech.edu/~azurebrd/var/work/phenote/ws_current.obo WS_current] info field The date will tell you when it was last updated; it should reflect the date the script was run.
  
 +
<br>
  
[http://www.wormbase.org/wiki/index.php/Caltech_documentation''back'']
+
[http://www.wormbase.org/wiki/index.php/Caltech_documentation ''back'']  
  
 
--[[User:Kyook|kjy]]
 
--[[User:Kyook|kjy]]

Revision as of 21:06, 31 May 2008

Updating local acedb to latest available WS (instructions from Wen)

You have to download the latest build from the Sanger website. From the command line (X11 on Mac OS), go to the local directory where the old WS is installed. Login anonymously to Sanger’s ftp site

bash-3.2$ ftp ftp.sanger.ac.uk

Connected to ftpservice2.sanger.ac.uk. 220-ftp.sanger.ac.uk NcFTPd Server (free educational license) ready. 220-Wellcome Trust Sanger Institute FTP server 220- 220-Problems after login? Try using '-' as the first character of you 220-password. 220- 220-**** 220-**** 220-**** 7/9/06 FTP Server upgraded please report any problems to 220-**** ftpadmin@sanger.ac.uk 220-**** 220 Name (ftp.sanger.ac.uk:Yook): anonymous 331 Guest login ok, send your complete e-mail address as password. Password:

Go to directory containing WS releases Download whole release (takes about 1 hour) Quit ftp

FTP> cd pub/wormbase

FTP> get WS188.tar

[or get –R WS188 for Ncfp client]

FTP> bye

Unzip tar Get into the new WS directory and run install to install new database (~15 minutes)

$ tar –xvvf WS188.tar

$ cd WS188 $./INSTALL

The readout after installation is as follows:

ACEDB installation script:

Yook will be known as the acedb-administrator We are going to install the acedb system in the present directory:

    /Users/Yook/WS_latest/WS188

This is your available disk space in this directory: Filesystem 1024-blocks Used Available Capacity Mounted on /dev/disk0s2 488050672 149925748 337868924 31% / The amount of space you need will depend on what data you are installing. For the source code and binary, you need around 15 Mb. Should we proceed? Please answer yes/no : yes

Exchange newest release with older one by removing old release and or change ./xace launch path to new release etc. $ rm <old WS release>

Updating .obo files

Retrieving object connections from the latest build files

Variation-gene and variation-paper connection information retrieved in phenote depends on information from the latest WS. After the upload, these .obo files in Postgres need to be repopulated with the most current information.

Run queries on the latest WS release for the most current information (it probably matters that the queries maintain the same column output?)

Allele_gene connections

To: Find all variations in the allele group (excludes SNPs etc.) along with the WBGeneID and public gene name of the gene they are assigned, if that is available.

Run the following AQL:  

select a, a->gene, a->gene->public_name from a in class variation where exists_tag a->allele

Sort results (optional)
Export as WS#_vargene into /Users/Yook/Postgres on scoobydoo (choose Separator character set to blank (TAB))
As of WS189 this query returns 20653 lines, and includes all alleles and in some cases, multiple genes linked to an allele (e.g.isoforms, alt gene models etc.)

Transgene_summary_paper connections

To: List transgenes already linked to a paper

Run the following AQL: 

select t, t->reference, t->summary from t in class transgene where exists t->reference

Sort results (optional)
Export as WS#_transpapsum into /Users/Yook/Postgres on scoobydoo (choose Separator character set to blank (TAB))
As of WS189 this query returns 5473 lines

Rearrangement_inside_gene connections

To: List rearrangements with LG, 'genes inside' and ‘gene outside’ (public names only)

Run the following AQL: 

select r, r->map, r->gene_inside->public_name, r->gene_outside->public_name from r in class rearrangement

Sort results (optional)
Export as WS#_rearragene into /Users/Yook/Postgres on scoobydoo (choose Separator character set to blank (TAB))
As of WS189 this query returns 10054 lines

Repopulating Postgres

Two scripts need to be run to populate Postgres with the updated variation information. It is important to run these scripts every time the variation information is updated. The scripts are on tazendra and run off of files Variation_gene.txt, transgene_summary_reference.txt and rearr_simple.txt. So files need to be transferred to tazendra and renamed to be recognizable by those scripts.

Transfer files to tazendra

Scripts are stored in acedb@tazendra.caltech.edu:/home/acedb/karen/populate_gin_variation

Rename WS189_vargene.txt to Variation_gene.txt

$ scp WS191_vargene.txt	acedb@tazendra.caltech.edu:/home/acedb/karen/populate_gin_variation/Variation_gene.txt 

Rename WS189_rearragene.txt to rearr_simple.txt

$ scp WS191_rearragene.txt acedb@tazendra.caltech.edu:/home/acedb/karen/populate_gin_variation/rearr_simple.txt 

Rename WS189_transpapsum to transgene_summary_reference.txt

$ scp WS191_transpapsum.txt acedb@tazendra.caltech.edu:/home/acedb/karen/populate_gin_variation/transgene_summary_reference.txt

Scripts to populate Phenote .obo files

populate_ali_alleleinfo.pl updates information based on Variation_gene.txt and transgene_summary_reference.txt. Make sure files are named accordingly or the program won’t see them. DO NOT run populate_gin_variation.pl.
make_obo.pl creates a text .obo based on rearr_simple.txt and Variation_gene.txt. This script populates the WS current info.

Both apps are on tazendra in the same directory as the updated variation info.

cd to /home/acedb/karen/populate_gin_variation/

$ ./populate_ali_alleleinfo.pl $ ./ make_obo.pl

NOTE: populate_gin_variation updates data based on variation_tab_wbgene file (in postgres / cgi) , which is no longer current.

To check if the re-population scripts worked, check out the WS_current info field The date will tell you when it was last updated; it should reflect the date the script was run.


back

--kjy