Website:WormMine
Contents
Background
Based on the Getting Started Tutorial:
http://intermine.org/wiki/GettingStarted
Users and Groups
sudo groupadd wormbase sudo adduser -M intermine // will be the user connecting to the mine database sudo passwd intermine xxxxxx sudo useradd tharris // not strictly required of course sudo usermod -a -G wormbase,intermine tharris
Install Prerequisites (Amazon Linux)
(see http://intermine.org/wiki/Prerequisites)
$ sudo yum install svn git // Amazon's Linux on EC2 comes java ready; on Debian: $ sudo apt-get install sun-java6-sdk $ sudo yum install ant antlr ant-antlr
Postgres
$ sudo yum install postgresql postgresql-server postgresql-devel postgresql-contrib postgresql-docs
Set up postgresql users and create a database called "wormmine" accessible to a user called "intermine".
$ sudo service postgresql initdb
Configure Postgres to allow logging via password:
$ sudo emacs /var/lib/pgsql/data/pg_hba.conf host all all 0.0.0.0./0 password
Configure suggested performance settings:
$ sudo emacs /var/lib/pgsql/data/postgresql.conf
Note: Performance settings have NOT yet been changed although they've all been entered into the configuration file.
Set up a database:
$ sudo /etc/init.d/postgresql start
[tharris@wb-dev: bin]> sudo -u postgres psql -d template1 -U postgres psql (8.4.9) Type "help" for help. template1=# CREATE USER intermine WITH PASSWORD 'xxxxxxx'; CREATE ROLE template1=# create database wormmine; template1=# create database "wormmine-items"; CREATE DATABASE template1=# GRANT ALL PRIVILEGES ON DATABASE wormmine to intermine; template1=# GRANT ALL PRIVILEGES ON DATABASE "wormmine-items" to intermine; template1=# \q
Note: will require 5432 open in the security group if access externally is desired.
Install bioseq into Postgres
- Download >= 0.8 http://www.bioinformatics.org/bioseg/wiki/ (annoying PHP download interface)
$ tar xzf bioseg* $ cd release-0.8 $ make USE_PGXS=t clean $ make USE_PGXS=t $ sudo make USE_PGXS=t install
Install bioseg into template1
$ cd /usr/share/pgsql/contrib
See http://intermine.org/wiki/BiosegInstallation for details.
Tomcat
Via the package manager:
$ sudo yum install tomcat6
Or via a stable binary:
$ curl -O http://mirror.csclub.uwaterloo.ca/apache/tomcat/tomcat-6/v6.0.33/bin/apache-tomcat-6.0.33.tar.gz $ tar -zxfv apache-tomcat-6* ; cd apache-tomcat* // startup.sh and shutdown.sh are found in apache-tomcat*/bin/
// Set up users. If installed via the package manager $ sudo emacs /etc/tomcat6/tomcat-users.xml
<tomcat-users> <role rolename="manager"/> <user username="manager" password="manager" roles="manager"/> </tomcat-users>
Edit the port in /etc/tomcat6/server.xml that Tomcat is listening on to something amenable with their architecture (for example 9999).
Getting the source
You'll need both the intermine source code as well as various wormbase "website-intermine" repositories.
> cd /usr/local/wormbase // or whatever > mkdir intermine ; chgrp intermine ; cd intermine > svn co svn://subversion.flymine.org/branches/intermine_0_98 > ln -s intermine_0_98 current
Fetch the WormBase-specific elements of intermine, currently maintained as separate repositories.
// The master directory, containing model and code for fetching/parsing datasets > cd intermine > git clone git@github.com:WormBase/website-intermine-master.git wormmine // sources > cd bio/sources > git clone git://github.com/WormBase/website-intermine-sources.git wormmine
Configure your environment
Copy a suitable starting point for the configuration file that defines database location, username, and password.
$ cd $ mkdir .intermine $ cp /usr/local/wormbase/intermine/current/wormmine/wormine.properties ~/.intermine/.
Or if you're more into symlinks (recommended):
$ cd ~/.intermine $ ln -s /usr/local/wormbase/intermine/current/wormmine/wormmine.properties wormmine.properties
Add some useful flags for ant to your ~/.bash_profile:
export ANT_OPTS="-server -XX:MaxPermSize=256M -Xmx1700m -XX:+UseParallelGC -Xms1700m -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99"
Create Databases
$ cd /usr/local/wormbase/intermine/current/wormmine $ createdb -E SQL_ASCII wormmine $ createdb -E SQL_ASCII wormmine-items
Build the Database
$ cd ${PROJECT_HOME}/wormmine/dbmodel $ ant clean $ ant build-db
Data Sources
Data sources are collected at /usr/local/wormbase/intermine/data, and within that directory by version.
/usr/local/wormbase/intermine/data/WSVERSION/SOURCE/DATA_TYPE
eg:
/usr/local/wormbase/intermine/data/WS227/wormbase/gff3/ /usr/local/wormbase/intermine/data/WS227/dbsnp /usr/local/wormbase/intermine/data/WS227/uniprot
WormBase specific modifications
The following directories currently contain WormMine-specific modifications that are NOT under version control:
- The entire intermine/wormmine directory. Relocating it and symlinking intermine/wormine -> our/source/repo doesn't work.
- The intermine/bio/sources directory. This *can* be relocated in wormmine/project.xml, but build paths in sources need to be updated, too.
Notes about this configuration
postgresql
- We might want to be using PostgreSQL 9
- The postgres user has remote access (don't do this)
- The postgresql server is using the default port - probably not a major concern. While security through obscurity is not a sufficient security implementation by itself, it does not hurt.
- The entire database is accessible to all users remotely (similar to postgres user having remote access)
- The pgdata directory is using the EBS root device, if the instance goes down, the data goes with it (unless you manually back it up). You also can not use that same EBS device on another instance; this makes vertical scaling difficult.
- We are using the 'trust' option in pg_hba.conf. As long as you limit it to socket connections, this can be acceptable, but you should really avoid 'trust' - if you want to use it, at least limit it to particular users and be sure it is only available for socket connections.
- Directory location/size. Using the default database cluster directory is not a terrible idea, but it should really be on a different device, this allows you to more conveniently scale the size of the database cluster. Generally, it is nice to move the pgdata directory off the EBS root device, and also to a unique directory (I like /pgdata) - it keeps mount points clean.