Difference between revisions of "Website:WormMine"

From WormBaseWiki
Jump to navigationJump to search
Line 109: Line 109:
 
  $ cd ~/.intermine
 
  $ cd ~/.intermine
 
  $ ln -s /usr/local/wormbase/intermine/current/wormmine/wormmine.properties wormmine.properties
 
  $ ln -s /usr/local/wormbase/intermine/current/wormmine/wormmine.properties wormmine.properties
 +
 +
 +
Add some useful flags for ant to your ~/.bash_profile:
 +
 +
export ANT_OPTS="-server -XX:MaxPermSize=256M -Xmx1700m -XX:+UseParallelGC -Xms1700m -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99"
  
 
== Create Databases ==
 
== Create Databases ==

Revision as of 19:03, 13 January 2012

Background

Based on the Getting Started Tutorial:

   http://intermine.org/wiki/GettingStarted

Users and Groups

sudo groupadd wormbase
sudo adduser -M intermine  // will be the user connecting to the mine database
sudo passwd intermine xxxxxx
sudo useradd tharris // not strictly required of course
sudo usermod -a -G wormbase,intermine tharris

Install Prerequisites (Amazon Linux)

(see http://intermine.org/wiki/Prerequisites)

 $ sudo yum install svn git
 // Amazon's Linux on EC2 comes java ready; on Debian: $ sudo apt-get install sun-java6-sdk
 $ sudo yum install ant

Postgres

 $ sudo yum install postgresql postgresql-server postgresql-devel postgresql-contrib postgresql-docs

Set up postgresql users and create a database called "wormmine" accessible to a user called "intermine".

$ sudo service postgresql initdb

Configure Postgres to allow logging via password:

$ sudo emacs /var/lib/pgsql/data/pg_hba.conf
host all     all   0.0.0.0./0   password

Configure suggested performance settings:

$ sudo emacs /var/lib/pgsql/data/postgresql.conf

Note: Performance settings have NOT yet been changed although they've all been entered into the configuration file.

Set up a database:

$ sudo /etc/init.d/postgresql start
[tharris@wb-dev: bin]> sudo -u postgres psql -d template1 -U postgres
psql (8.4.9)
Type "help" for help.

template1=# CREATE USER intermine WITH PASSWORD 'xxxxxxx';
CREATE ROLE
template1=# create database wormmine;
template1=# create database "items-wormmine";
CREATE DATABASE
template1=# GRANT ALL PRIVILEGES ON DATABASE wormmine to intermine;
template1=# GRANT ALL PRIVILEGES ON DATABASE items-wormmine to intermine;
template1=#  \q

Note: will require 5432 open in the security group if access externally is desired.

Tomcat

Via the package manager:

 $ sudo yum install tomcat6

Or via a stable binary:

 $ curl -O http://mirror.csclub.uwaterloo.ca/apache/tomcat/tomcat-6/v6.0.33/bin/apache-tomcat-6.0.33.tar.gz
 $ tar -zxfv apache-tomcat-6* ; cd apache-tomcat*
 // startup.sh and shutdown.sh are found in apache-tomcat*/bin/
// Set up users. If installed via the package manager
$ sudo emacs /etc/tomcat6/tomcat-users.xml
<tomcat-users>
 <role rolename="manager"/>
 <user username="manager" password="manager" roles="manager"/>
</tomcat-users>

Edit the port in /etc/tomcat6/server.xml that Tomcat is listening on to something amenable with their architecture (for example 9999).


Getting the source

You'll need both the intermine source code as well as various wormbase "website-intermine" repositories.

> cd /usr/local/wormbase    // or whatever
> mkdir intermine ; chgrp intermine ; cd intermine
> svn co svn://subversion.flymine.org/branches/intermine_0_98
> ln -s intermine_0_98 current

Fetch the WormBase-specific elements of intermine, currently maintained as separate repositories.

// The master directory, containing model and code for fetching/parsing datasets
> cd intermine
> git clone git@github.com:WormBase/website-intermine-master.git wormmine
// sources
> cd bio/sources
> git clone git://github.com/WormBase/website-intermine-sources.git wormmine

Configure your environment

Copy a suitable starting point for the configuration file that defines database location, username, and password.

$ cd
$ mkdir .intermine
$ cp /usr/local/wormbase/intermine/current/wormmine/wormine.properties ~/.intermine/.

Or if you're more into symlinks (recommended):

$ cd ~/.intermine
$ ln -s /usr/local/wormbase/intermine/current/wormmine/wormmine.properties wormmine.properties


Add some useful flags for ant to your ~/.bash_profile:

export ANT_OPTS="-server -XX:MaxPermSize=256M -Xmx1700m -XX:+UseParallelGC -Xms1700m -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99"

Create Databases

$ cd /usr/local/wormbase/intermine/current/wormmine
$ createdb wormmine
$ createdb wormmine-items

Data Sources

Data sources are collected at /usr/local/wormbase/intermine/data, and within that directory by version.

/usr/local/wormbase/intermine/data/WSVERSION/SOURCE/DATA_TYPE

eg:

/usr/local/wormbase/intermine/data/WS227/wormbase/gff3/
/usr/local/wormbase/intermine/data/WS227/dbsnp
/usr/local/wormbase/intermine/data/WS227/uniprot


WormBase specific modifications

The following directories currently contain WormMine-specific modifications that are NOT under version control:

  • The entire intermine/wormmine directory. Relocating it and symlinking intermine/wormine -> our/source/repo doesn't work.
  • The intermine/bio/sources directory. This *can* be relocated in wormmine/project.xml, but build paths in sources need to be updated, too.


Notes about this configuration

postgresql

  • We might want to be using PostgreSQL 9
  • The postgres user has remote access (don't do this)
  • The postgresql server is using the default port - probably not a major concern. While security through obscurity is not a sufficient security implementation by itself, it does not hurt.
  • The entire database is accessible to all users remotely (similar to postgres user having remote access)
  • The pgdata directory is using the EBS root device, if the instance goes down, the data goes with it (unless you manually back it up). You also can not use that same EBS device on another instance; this makes vertical scaling difficult.
  • We are using the 'trust' option in pg_hba.conf. As long as you limit it to socket connections, this can be acceptable, but you should really avoid 'trust' - if you want to use it, at least limit it to particular users and be sure it is only available for socket connections.
  • Directory location/size. Using the default database cluster directory is not a terrible idea, but it should really be on a different device, this allows you to more conveniently scale the size of the database cluster. Generally, it is nice to move the pgdata directory off the EBS root device, and also to a unique directory (I like /pgdata) - it keeps mount points clean.