Website:WormMine

From WormBaseWiki
Jump to navigationJump to search

Background

Based on the Getting Started Tutorial:

   http://intermine.org/wiki/GettingStarted

Users and Groups

sudo groupadd wormbase
sudo adduser -M intermine  // will be the user connecting to the mine database
sudo passwd intermine xxxxxx
sudo useradd tharris // not strictly required of course
sudo usermod -a -G wormbase,intermine tharris

Install Prerequisites (Amazon Linux)

(see http://intermine.org/wiki/Prerequisites)

 $ sudo yum install svn git
 // Amazon's Linux on EC2 comes java ready; on Debian: $ sudo apt-get install sun-java6-sdk
 $ sudo yum install ant antlr ant-antlr

Note: JAVA_HOME may be incorrectly set to the jre and not jdk. To correct this, remove the trailing /jre from the JAVA_HOME variable. It should look something like this:

export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64

Postgres

 $ sudo yum install postgresql postgresql-server postgresql-devel postgresql-contrib postgresql-docs

Set up postgresql users and create a database called "wormmine" accessible to a user called "intermine".

$ sudo service postgresql initdb

Configure Postgres to allow logging via password:

$ sudo emacs /var/lib/pgsql/data/pg_hba.conf
host all     all   0.0.0.0./0   password

Configure suggested performance settings:

$ sudo emacs /var/lib/pgsql/data/postgresql.conf

Note: Performance settings have NOT yet been changed although they've all been entered into the configuration file.

Set up a database:

$ sudo /etc/init.d/postgresql start
[tharris@wb-dev: bin]> sudo -u postgres psql -d template1 -U postgres
psql (8.4.9)
Type "help" for help.

template1=# CREATE USER intermine WITH PASSWORD 'xxxxxxx';
CREATE ROLE
template1=# create database wormmine;
template1=# create database "wormmine-items";
CREATE DATABASE
template1=# GRANT ALL PRIVILEGES ON DATABASE wormmine to intermine;
template1=# GRANT ALL PRIVILEGES ON DATABASE "wormmine-items" to intermine;
template1=#  \q

Note: will require 5432 open in the security group if access externally is desired.


Install bioseq into Postgres

$ tar xzf bioseg*
$ cd release-0.8
$ make USE_PGXS=t clean
$ make USE_PGXS=t
$ sudo make USE_PGXS=t install

Install bioseg into template1

$ cd /usr/share/pgsql/contrib

See http://intermine.org/wiki/BiosegInstallation for details.

Tomcat

Via the package manager:

 $ sudo yum install tomcat6

Or via a stable binary:

 $ curl -O http://mirror.csclub.uwaterloo.ca/apache/tomcat/tomcat-6/v6.0.33/bin/apache-tomcat-6.0.33.tar.gz
 $ tar -zxfv apache-tomcat-6* ; cd apache-tomcat*
 // startup.sh and shutdown.sh are found in apache-tomcat*/bin/
// Set up users. If installed via the package manager
$ sudo emacs /etc/tomcat6/tomcat-users.xml
<tomcat-users>
 <role rolename="manager"/>
 <user username="manager" password="manager" roles="manager"/>
</tomcat-users>

Edit the port in /etc/tomcat6/server.xml that Tomcat is listening on to something amenable with their architecture (for example 9999).

Getting the source

You'll need both the intermine source code as well as various wormbase "website-intermine" repositories.

> cd /usr/local/wormbase    // or whatever
> mkdir intermine ; chgrp intermine ; cd intermine
> svn co svn://subversion.flymine.org/branches/intermine_0_98
> ln -s intermine_0_98 current

Fetch the WormBase-specific elements of intermine, currently maintained as separate repositories.

// The master directory, containing model and code for fetching/parsing datasets
> cd intermine
> git clone git@github.com:WormBase/website-intermine-master.git wormmine
// sources
> cd bio/sources
> git clone git://github.com/WormBase/website-intermine-sources.git wormmine

Configure your environment

Copy a suitable starting point for the configuration file that defines database location, username, and password.

$ cd
$ mkdir .intermine
$ cp /usr/local/wormbase/intermine/current/wormmine/wormine.properties ~/.intermine/.

Or if you're more into symlinks (recommended):

$ cd ~/.intermine
$ ln -s /usr/local/wormbase/intermine/current/wormmine/wormmine.properties wormmine.properties


Add some useful flags for ant to your ~/.bash_profile:

export ANT_OPTS="-server -XX:MaxPermSize=256M -Xmx1700m -XX:+UseParallelGC -Xms1700m -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99"

Create Databases

$ cd /usr/local/wormbase/intermine/current/wormmine
$ createdb -E SQL_ASCII wormmine
$ createdb -E SQL_ASCII wormmine-items

Build the Database

$ cd ${PROJECT_HOME}/wormmine/dbmodel
$ ant clean
$ ant build-db


Data Sources

Data sources are collected at /usr/local/wormbase/intermine/data, and within that directory by version.

/usr/local/wormbase/intermine/data/WSVERSION/SOURCE/DATA_TYPE

eg:

/usr/local/wormbase/intermine/data/WS227/wormbase/gff3/
/usr/local/wormbase/intermine/data/WS227/dbsnp
/usr/local/wormbase/intermine/data/WS227/uniprot

WormBase specific modifications

The following directories currently contain WormMine-specific modifications that are NOT under version control:

  • The entire intermine/wormmine directory. Relocating it and symlinking intermine/wormine -> our/source/repo doesn't work.
  • The intermine/bio/sources directory. This *can* be relocated in wormmine/project.xml, but build paths in sources need to be updated, too.


Notes about this configuration

postgresql

  • We might want to be using PostgreSQL 9
  • The postgres user has remote access (don't do this)
  • The postgresql server is using the default port - probably not a major concern. While security through obscurity is not a sufficient security implementation by itself, it does not hurt.
  • The entire database is accessible to all users remotely (similar to postgres user having remote access)
  • The pgdata directory is using the EBS root device, if the instance goes down, the data goes with it (unless you manually back it up). You also can not use that same EBS device on another instance; this makes vertical scaling difficult.
  • We are using the 'trust' option in pg_hba.conf. As long as you limit it to socket connections, this can be acceptable, but you should really avoid 'trust' - if you want to use it, at least limit it to particular users and be sure it is only available for socket connections.
  • Directory location/size. Using the default database cluster directory is not a terrible idea, but it should really be on a different device, this allows you to more conveniently scale the size of the database cluster. Generally, it is nice to move the pgdata directory off the EBS root device, and also to a unique directory (I like /pgdata) - it keeps mount points clean.