Building WormMart

From WormBaseWiki
Jump to navigationJump to search

WormMart is WormBase's implementation of the BioMart data warehouse.

In simple terms, WormMart is built by transforming WormBase's AceDB database into a BioMart schema MySQL database. Standard BioMart tools are then used to add user interface configuration details to the database.

The code forms part of BioMart's mart-build project. See Web CVS for more details.

Building WormMart

The original instructions on building WormMart can be found here. They are pasted below for ease of access.

==========================================================

Building the WormBase BioMart Database

==========================================================

This document provides information on how to build the WormBase BioMart database (WormMart).


1. Software


1a. Install the ACeDB Software

The ACeDB software should be installed as detailed here;

 http://search.cpan.org/src/LDS/AcePerl-1.91/docs/ACEDB.HOWTO

Ensure that the saceserver binary is found in your path;

 $ which saceserver


1b. Install the Perl client code

Go to a convenient directory to install the perl code

 $ cd $HOME/biomart

Get the perl client code from Ensembl's CVS

 $ cvs -d :pserver:cvsuser@cvsro.sanger.ac.uk:/cvsroot/biomart login
 [pass CVSUSER]
 $ cvs -d :pserver:cvsuser@cvsro.sanger.ac.uk:/cvsroot/biomart co \
 mart-build



2. Databases


2.a Install latest WormBase ACeDB database

Set a variable for the WormBase release being built, and the directory where the WormBaae ACeDB databases live;

 $ set WB_RELEASE=191
 $ set DBDIR=/nfs/disk100/wormpub/DATABASES

If you already have access to the latest WormBase database (e.g. if you are working on cbi4 at sanger, and can read $DBDIR) you can skip ahead to step 2b.

Get the source ACe database from sanger's FTP stite;

 $ cd $DBDIR
 $ wget -r -nH --cut-dirs=2 \
   "ftp://ftp.sanger.ac.uk/pub/wormbase/WS${WB_RELEASE}
 $ cd WS${WB_RELEASE}
 $ ./INSTALL


2.b Start the Ace server

The server is started by calling the saceserver binary as follows ;

 $ saceserver $DBDIR/WS${WB_RELEASE} 23100 600000:600000:100000 &


2.c Create an empty target WormMart database (use setenv so $MART_DBNAME

   is available as an environment variable)
 $ setenv MART_DBNAME "wormmart_${WB_RELEASE}"
 $ alias mysql_worm 'mysql -hia64d -uwormadmin -psecret'
 $ mysql_worm -e "create database $MART_DBNAME"



3. Running the Build


The following (for running launch_wormmart_build.sh) assumes that you are on the Sanger cluster, and have access to the LSF queue called 'long' that suppors jobs up to 8G in size. If not, it is farly simple to change the ./launch_wormmart_build.sh script to run the jobs on the local host.

Assuming that the mart-build code was installed into $HOME/biomart in stage 1b;

 $ cd $HOME/biomart/mart-build/scripts/wormbase-mart
 $ mkdir -p ./logs

Check the settings at the start of the wormmart_build script

 $ <editor> launch_wormmart_build.sh

And start the process

 $ ./launch_wormmart_build.sh



4. QC


Is there anything in the logs that suggests a failure? The logs/*.err files should all end in;

[INFO] Completed VariationLoader [INFO] Completed ace2mart

Check this with, eg.;

 $ tail -n2 ./logs/*.err



5. Post-build tasks


5a. Database copy

To copy, for instance, the database from Sanger (where it is built) to CSHL (where it is deployed), the followimng procedure could be followed;

 $ set MART_HOST=hostname.of.mysql.server
 $ ssh $MART_HOST #Dumping tab delimited files so must be on same machine
 $ set MART_DBNAME=wormmart_191
 $ cd {$HOME}/data/mysql
 $ mkdir $MART_DBNAME
 $ chmod 777 $MART_DBNAME #So that the mysql user can write to the dir
 $ mysqldump -h 127.0.0.1 -T $MART_DBNAME $MART_DBNAME
 (~ ?? mins)
 $ tar -zcvf ${MART_DBNAME}.tgz $MART_DBNAME
 (~ ?? mins)
 $ set MART_REMOTE='formaggio.cshl.edu'
 $ scp ${MART_DBNAME}.tgz ${USER}@${MART_REMOTE}:~/
 (~ 20 mins)
 $ exit # Leave the remote host
 $ ssh ${USER}@${MART_REMOTE} # Need to transfer the $MART_DBNAME variable
 # check that there is enough free space. At least 10G
 $ df -h /usr/local/mysql/data
 $ screen # Do this under screen for safety.
          # Create a new shell if already under screen
 $ cd
 $ set MART_DBNAME=wormmart_191
 $ tar -zxvf ${MART_DBNAME}.tgz 
 $ cd $MART_DBNAME
 $ mysql -e "create database $MART_DBNAME"
 $ cat *.sql | mysql $MART_DBNAME
 $ mysqlimport --local $MART_DBNAME $PWD/*.txt 


5b. Database Configuration

This is where we add meta tables to the WormMart database;

 - Open MartEditor (Java app)
 - Connect to the previous WormMart database,
 - Select "File"->"Move All", select the new database when prompted.

5c. MartView Configuration

This is where we configure the MartView interface on the WormBase web site to use the latest WormMart database.

5d. QC

Make sure that the number of objects reported by count is the same in live and dev marts.