From WormBaseWiki
Jump to: navigation, search


Based on the Getting Started Tutorial:

Users and Groups

sudo groupadd wormbase
sudo adduser -M intermine  // will be the user connecting to the mine database
sudo passwd intermine xxxxxx
sudo useradd tharris // not strictly required of course
sudo usermod -a -Gwormbase intermine tharris

Install Prerequisites (Amazon Linux)


 $ sudo yum install svn git
 // Amazon's Linux on EC2 comes java ready; on Debian: $ sudo apt-get install sun-java6-sdk
 $ sudo yum install ant antlr ant-antlr

Note: JAVA_HOME may be incorrectly set to the jre and not jdk. To correct this, remove the trailing /jre from the JAVA_HOME variable. It should look something like this:

export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-


 $ sudo yum install postgresql postgresql-server postgresql-devel postgresql-contrib postgresql-docs

Set up postgresql users and create a database called "wormmine" accessible to a user called "intermine".

$ sudo service postgresql initdb

Configure Postgres to allow logging via password:

$ sudo emacs /var/lib/pgsql9/data/pg_hba.conf
host all     all   password

Configure suggested performance settings:

$ sudo emacs /var/lib/pgsql9/data/postgresql.conf

Note: Performance settings have NOT yet been changed although they've all been entered into the configuration file.

Set up a database:

$ sudo /etc/init.d/postgresql start
[tharris@wb-dev: bin]> sudo -u postgres psql -d template1 -U postgres
psql (8.4.9)
Type "help" for help.

 $ createdb -U intermine wormmine;
 $ createdb -U intermine wormmine-test;
 $ createdb -U intermine wormmine-items;
 $ createdb -U intermine unittest;
 $ createdb -U intermine truncunittest;
 $ createdb -U intermine fulldataset;
 $ createdb -U intermine flatmodetest;
 $ createdb -U intermine notxmltest;
 $ createdb -U intermine bio-test;
 $ createdb -U intermine bio-fulldata-test;
 $ createdb -U intermine wormmine-userprofile;
 $ sudo su - postgres createuser intermine

Note: will require 5432 open in the security group if access externally is desired.

Install bioseq into Postgres

$ tar xzf bioseg*
$ cd release-0.8
$ make USE_PGXS=t clean
$ make USE_PGXS=t
$ sudo make USE_PGXS=t install

Install bioseg into template1

$ cd /usr/share/pgsql/contrib

See for details.

Add a 100 GB data mount at /dev/xvdf

1. Create a new 100 GB volume in the console.
2. In console attach volume at /dev/sdf
3. SSH to the proxy and format the disk
 > sudo mkfs.ext3 /dev/xvdf
4. Mount the disk
 cd /usr/local/wormbase/intermine
 mkdir data
 chown tharris:wormbase data
 chmod 2775 data
 sudo mount /dev/xvdf /usr/local/wormbase/intermine/data

5. Edit fstab

  # Primary mount point
  /dev/xvdf /usr/local/wormbase/intermine/data ext3 noatime 0 0

Relocate the data directory

 sudo cp --preserve -r /var/lib/pgsql/data /usr/local/wormbase/intermine/postgres/.

Edit the PGDATA and data_directory variables in /etc/init.d/postgres and /var/lib/pgsql/data/postgres.conf respectively to point to /usr/local/wormbase/intermine/data/postgres/data.


Via the package manager:

 $ sudo yum install tomcat6

Or via a stable binary:

 $ curl -O
 $ tar -zxfv apache-tomcat-6* ; cd apache-tomcat*
 // and are found in apache-tomcat*/bin/
// Set up users. If installed via the package manager
$ sudo emacs /etc/tomcat6/tomcat-users.xml
 <role rolename="manager"/>
 <user username="manager" password="manager" roles="manager"/>

Edit the port in /etc/tomcat6/server.xml that Tomcat is listening on to something amenable with their architecture (for example 9999).

Getting the source

You'll need both the intermine source code as well as various wormbase "website-intermine" repositories.

> cd /usr/local/wormbase    // or whatever
> mkdir intermine ; chgrp intermine ; cd intermine
> svn co svn://
> ln -s intermine_0_98 current

Fetch the WormBase-specific elements of intermine, currently maintained as separate repositories.

// The master directory, containing model and code for fetching/parsing datasets
> cd intermine
> git clone wormmine
// sources
> cd bio/sources
> git clone git:// wormmine

Configure your environment

Copy a suitable starting point for the configuration file that defines database location, username, and password.

$ cd
$ mkdir .intermine
$ cp /usr/local/wormbase/intermine/current/wormmine/ ~/.intermine/.

Or if you're more into symlinks (recommended):

$ cd ~/.intermine
$ ln -s /usr/local/wormbase/intermine/current/wormmine/

Add some useful flags for ant to your ~/.bash_profile:

export ANT_OPTS="-server -XX:MaxPermSize=256M -Xmx1700m -XX:+UseParallelGC -Xms1700m -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99"

Create Databases

$ cd /usr/local/wormbase/intermine/current/wormmine
$ createdb -E SQL_ASCII wormmine
$ createdb -E SQL_ASCII wormmine-items

Build the Database

$ cd ${PROJECT_HOME}/wormmine/dbmodel
$ ant clean
$ ant build-db

Data Sources

Data sources are collected at /usr/local/wormbase/intermine/data, and within that directory in the following structure.


eg /usr/local/wormbase/intermine/data/WS227/uniprot

And for species specific-data

eg: /usr/local/wormbase/intermine/data/WS227/wormbase/c_elegans/gff3/

To avoid having to edit/re-edit the "project.xml" file for each new release, we use a symlink to point to the appropriate data for a new build:

$ ls /usr/local/wormbase/intermine/data
  current -> WS228

Genomic Sequence

Genomic Annotations


./scripts/ --release [WSRELEASE]

WormBase specific modifications

The following directories currently contain WormMine-specific modifications that are NOT under version control:

  • The entire intermine/wormmine directory. Relocating it and symlinking intermine/wormine -> our/source/repo doesn't work.
  • The intermine/bio/sources directory. This *can* be relocated in wormmine/project.xml, but build paths in sources need to be updated, too.

Create some new sources

We'll create a custom source for loading GFF3. We want to fetch and parse it first.

$ cd intermine/current
$ ./bio/scripts/make_source wormbase-gff3 gff


Notes about this configuration


  • We might want to be using PostgreSQL 9
  • The postgres user has remote access (don't do this)
  • The postgresql server is using the default port - probably not a major concern. While security through obscurity is not a sufficient security implementation by itself, it does not hurt.
  • The entire database is accessible to all users remotely (similar to postgres user having remote access)
  • The pgdata directory is using the EBS root device, if the instance goes down, the data goes with it (unless you manually back it up). You also can not use that same EBS device on another instance; this makes vertical scaling difficult.
  • We are using the 'trust' option in pg_hba.conf. As long as you limit it to socket connections, this can be acceptable, but you should really avoid 'trust' - if you want to use it, at least limit it to particular users and be sure it is only available for socket connections.
  • Directory location/size. Using the default database cluster directory is not a terrible idea, but it should really be on a different device, this allows you to more conveniently scale the size of the database cluster. Generally, it is nice to move the pgdata directory off the EBS root device, and also to a unique directory (I like /pgdata) - it keeps mount points clean.

Tips and Tricks

Find the total number of objects in the 'mine

psql -U intermine malariamine
psql (8.4.9)
Type "help" for help.
wormmine=> select count(*) from intermineobject;
wormmine=> select class,count(*) from intermineobject group by class;