Difference between revisions of "Website:WormMine"
(32 intermediate revisions by 3 users not shown) | |||
Line 11: | Line 11: | ||
sudo passwd intermine xxxxxx | sudo passwd intermine xxxxxx | ||
sudo useradd tharris // not strictly required of course | sudo useradd tharris // not strictly required of course | ||
− | sudo usermod -a - | + | sudo usermod -a -Gwormbase intermine tharris |
== Install Prerequisites (Amazon Linux) == | == Install Prerequisites (Amazon Linux) == | ||
Line 19: | Line 19: | ||
$ sudo yum install svn git | $ sudo yum install svn git | ||
// Amazon's Linux on EC2 comes java ready; on Debian: $ sudo apt-get install sun-java6-sdk | // Amazon's Linux on EC2 comes java ready; on Debian: $ sudo apt-get install sun-java6-sdk | ||
− | $ sudo yum install ant | + | $ sudo yum install ant antlr ant-antlr |
+ | |||
+ | ''Note: JAVA_HOME may be incorrectly set to the jre and not jdk. To correct this, remove the trailing /jre from the JAVA_HOME variable. It should look something like this: | ||
+ | |||
+ | export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64 | ||
=== Postgres === | === Postgres === | ||
− | $ sudo yum install postgresql | + | $ sudo yum install postgresql postgresql-server postgresql-devel postgresql-contrib postgresql-docs |
Set up postgresql users and create a database called "wormmine" accessible to a user called "intermine". | Set up postgresql users and create a database called "wormmine" accessible to a user called "intermine". | ||
$ sudo service postgresql initdb | $ sudo service postgresql initdb | ||
+ | |||
+ | Configure Postgres to allow logging via password: | ||
+ | $ sudo emacs /var/lib/pgsql9/data/pg_hba.conf | ||
+ | host all all 0.0.0.0./0 password | ||
+ | |||
+ | Configure suggested performance settings: | ||
+ | $ sudo emacs /var/lib/pgsql9/data/postgresql.conf | ||
+ | |||
+ | Note: ''Performance settings have NOT yet been changed although they've all been entered into the configuration file.'' | ||
+ | |||
+ | Set up a database: | ||
$ sudo /etc/init.d/postgresql start | $ sudo /etc/init.d/postgresql start | ||
Line 35: | Line 50: | ||
Type "help" for help. | Type "help" for help. | ||
− | + | $ createdb -U intermine wormmine; | |
− | + | $ createdb -U intermine wormmine-test; | |
− | + | $ createdb -U intermine wormmine-items; | |
− | + | $ createdb -U intermine unittest; | |
− | + | $ createdb -U intermine truncunittest; | |
− | + | $ createdb -U intermine fulldataset; | |
− | + | $ createdb -U intermine flatmodetest; | |
− | + | $ createdb -U intermine notxmltest; | |
+ | $ createdb -U intermine bio-test; | ||
+ | $ createdb -U intermine bio-fulldata-test; | ||
+ | $ createdb -U intermine wormmine-userprofile; | ||
+ | $ sudo su - postgres createuser intermine | ||
</pre> | </pre> | ||
''Note: will require 5432 open in the security group if access externally is desired.'' | ''Note: will require 5432 open in the security group if access externally is desired.'' | ||
+ | |||
+ | Install bioseq into Postgres | ||
+ | |||
+ | * Download >= 0.8 http://www.bioinformatics.org/bioseg/wiki/ (annoying PHP download interface) | ||
+ | $ tar xzf bioseg* | ||
+ | $ cd release-0.8 | ||
+ | $ make USE_PGXS=t clean | ||
+ | $ make USE_PGXS=t | ||
+ | $ sudo make USE_PGXS=t install | ||
+ | |||
+ | Install bioseg into template1 | ||
+ | $ cd /usr/share/pgsql/contrib | ||
+ | |||
+ | See http://intermine.org/wiki/BiosegInstallation for details. | ||
+ | |||
+ | ==== Add a 100 GB data mount at /dev/xvdf ==== | ||
+ | 1. Create a new 100 GB volume in the console. | ||
+ | 2. In console attach volume at /dev/sdf | ||
+ | 3. SSH to the proxy and format the disk | ||
+ | > sudo mkfs.ext3 /dev/xvdf | ||
+ | 4. Mount the disk | ||
+ | cd /usr/local/wormbase/intermine | ||
+ | mkdir data | ||
+ | chown tharris:wormbase data | ||
+ | chmod 2775 data | ||
+ | sudo mount /dev/xvdf /usr/local/wormbase/intermine/data | ||
+ | 5. Edit fstab | ||
+ | # Primary mount point | ||
+ | /dev/xvdf /usr/local/wormbase/intermine/data ext3 noatime 0 0 | ||
+ | |||
+ | ==== Relocate the data directory ==== | ||
+ | sudo cp --preserve -r /var/lib/pgsql/data /usr/local/wormbase/intermine/postgres/. | ||
+ | |||
+ | Edit the PGDATA and data_directory variables in /etc/init.d/postgres and /var/lib/pgsql/data/postgres.conf respectively to point to /usr/local/wormbase/intermine/data/postgres/data. | ||
=== Tomcat === | === Tomcat === | ||
Line 68: | Line 121: | ||
Edit the port in /etc/tomcat6/server.xml that Tomcat is listening on to something amenable with their architecture (for example 9999). | Edit the port in /etc/tomcat6/server.xml that Tomcat is listening on to something amenable with their architecture (for example 9999). | ||
− | |||
− | |||
== Getting the source == | == Getting the source == | ||
Line 78: | Line 129: | ||
> mkdir intermine ; chgrp intermine ; cd intermine | > mkdir intermine ; chgrp intermine ; cd intermine | ||
> svn co svn://subversion.flymine.org/branches/intermine_0_98 | > svn co svn://subversion.flymine.org/branches/intermine_0_98 | ||
− | > ln -s intermine_0_98 | + | > ln -s intermine_0_98 current |
Fetch the WormBase-specific elements of intermine, currently maintained as separate repositories. | Fetch the WormBase-specific elements of intermine, currently maintained as separate repositories. | ||
// The master directory, containing model and code for fetching/parsing datasets | // The master directory, containing model and code for fetching/parsing datasets | ||
> cd intermine | > cd intermine | ||
− | > git clone | + | > git clone git@github.com:WormBase/website-intermine-master.git wormmine |
// sources | // sources | ||
> cd bio/sources | > cd bio/sources | ||
> git clone git://github.com/WormBase/website-intermine-sources.git wormmine | > git clone git://github.com/WormBase/website-intermine-sources.git wormmine | ||
+ | == Configure your environment == | ||
+ | |||
+ | Copy a suitable starting point for the configuration file that defines database location, username, and password. | ||
+ | $ cd | ||
+ | $ mkdir .intermine | ||
+ | $ cp /usr/local/wormbase/intermine/current/wormmine/wormine.properties ~/.intermine/. | ||
+ | Or if you're more into symlinks (recommended): | ||
+ | $ cd ~/.intermine | ||
+ | $ ln -s /usr/local/wormbase/intermine/current/wormmine/wormmine.properties wormmine.properties | ||
+ | Add some useful flags for ant to your ~/.bash_profile: | ||
+ | |||
+ | export ANT_OPTS="-server -XX:MaxPermSize=256M -Xmx1700m -XX:+UseParallelGC -Xms1700m -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99" | ||
+ | |||
+ | == Create Databases == | ||
+ | |||
+ | $ cd /usr/local/wormbase/intermine/current/wormmine | ||
+ | $ createdb -E SQL_ASCII wormmine | ||
+ | $ createdb -E SQL_ASCII wormmine-items | ||
+ | |||
+ | == Build the Database == | ||
+ | |||
+ | $ cd ${PROJECT_HOME}/wormmine/dbmodel | ||
+ | $ ant clean | ||
+ | $ ant build-db | ||
== Data Sources == | == Data Sources == | ||
− | Data sources are collected at /usr/local/wormbase/intermine/data, and within that directory | + | Data sources are collected at /usr/local/wormbase/intermine/data, and within that directory in the following structure. |
− | + | Minimally | |
+ | /usr/local/wormbase/intermine/data/WSVERSION/SOURCE/ | ||
+ | eg /usr/local/wormbase/intermine/data/WS227/uniprot | ||
− | + | And for species specific-data | |
− | + | /usr/local/wormbase/intermine/data/WSVERSION/SOURCE/SPECIES/DATA_TYPE | |
− | /usr/local/wormbase/intermine/data/WS227 | + | eg: /usr/local/wormbase/intermine/data/WS227/wormbase/c_elegans/gff3/ |
− | |||
+ | To avoid having to edit/re-edit the "project.xml" file for each new release, we use a symlink to point to the appropriate data for a new build: | ||
+ | $ ls /usr/local/wormbase/intermine/data | ||
+ | current -> WS228 | ||
+ | |||
+ | === Genomic Sequence === | ||
+ | |||
+ | === Genomic Annotations === | ||
+ | |||
+ | === Uniprot === | ||
+ | |||
+ | ./scripts/get_uniprot.pl --release [WSRELEASE] | ||
== WormBase specific modifications == | == WormBase specific modifications == | ||
Line 113: | Line 200: | ||
* The intermine/bio/sources directory. This *can* be relocated in wormmine/project.xml, but build paths in sources need to be updated, too. | * The intermine/bio/sources directory. This *can* be relocated in wormmine/project.xml, but build paths in sources need to be updated, too. | ||
+ | |||
+ | |||
+ | |||
+ | == Create some new sources == | ||
+ | |||
+ | We'll create a custom source for loading GFF3. We want to fetch and parse it first. | ||
+ | |||
+ | $ cd intermine/current | ||
+ | $ ./bio/scripts/make_source wormbase-gff3 gff | ||
+ | |||
+ | |||
+ | === Uniprot === | ||
+ | |||
+ | == Notes about this configuration == | ||
+ | |||
+ | === postgresql === | ||
+ | |||
+ | * We might want to be using PostgreSQL 9 | ||
+ | |||
+ | * The postgres user has remote access (don't do this) | ||
+ | |||
+ | * The postgresql server is using the default port - probably not a major concern. While security through obscurity is not a sufficient security implementation by itself, it does not hurt. | ||
+ | |||
+ | * The entire database is accessible to all users remotely (similar to postgres user having remote access) | ||
+ | |||
+ | * The pgdata directory is using the EBS root device, if the instance goes down, the data goes with it (unless you manually back it up). You also can not use that same EBS device on another instance; this makes vertical scaling difficult. | ||
+ | |||
+ | * We are using the 'trust' option in pg_hba.conf. As long as you limit it to socket connections, this can be acceptable, but you should really avoid 'trust' - if you want to use it, at least limit it to particular users and be sure it is only available for socket connections. | ||
+ | |||
+ | * Directory location/size. Using the default database cluster directory is not a terrible idea, but it should really be on a different device, this allows you to more conveniently scale the size of the database cluster. Generally, it is nice to move the pgdata directory off the EBS root device, and also to a unique directory (I like /pgdata) - it keeps mount points clean. | ||
+ | |||
+ | == Tips and Tricks == | ||
+ | |||
+ | Find the total number of objects in the 'mine | ||
+ | psql -U intermine malariamine | ||
+ | psql (8.4.9) | ||
+ | Type "help" for help. | ||
+ | |||
+ | wormmine=> select count(*) from intermineobject; | ||
+ | wormmine=> select class,count(*) from intermineobject group by class; | ||
+ | |||
+ | [[Category:Architecture (Web Dev)]] |
Latest revision as of 14:55, 19 June 2014
Contents
Background
Based on the Getting Started Tutorial:
http://intermine.org/wiki/GettingStarted
Users and Groups
sudo groupadd wormbase sudo adduser -M intermine // will be the user connecting to the mine database sudo passwd intermine xxxxxx sudo useradd tharris // not strictly required of course sudo usermod -a -Gwormbase intermine tharris
Install Prerequisites (Amazon Linux)
(see http://intermine.org/wiki/Prerequisites)
$ sudo yum install svn git // Amazon's Linux on EC2 comes java ready; on Debian: $ sudo apt-get install sun-java6-sdk $ sudo yum install ant antlr ant-antlr
Note: JAVA_HOME may be incorrectly set to the jre and not jdk. To correct this, remove the trailing /jre from the JAVA_HOME variable. It should look something like this:
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64
Postgres
$ sudo yum install postgresql postgresql-server postgresql-devel postgresql-contrib postgresql-docs
Set up postgresql users and create a database called "wormmine" accessible to a user called "intermine".
$ sudo service postgresql initdb
Configure Postgres to allow logging via password:
$ sudo emacs /var/lib/pgsql9/data/pg_hba.conf host all all 0.0.0.0./0 password
Configure suggested performance settings:
$ sudo emacs /var/lib/pgsql9/data/postgresql.conf
Note: Performance settings have NOT yet been changed although they've all been entered into the configuration file.
Set up a database:
$ sudo /etc/init.d/postgresql start
[tharris@wb-dev: bin]> sudo -u postgres psql -d template1 -U postgres psql (8.4.9) Type "help" for help. $ createdb -U intermine wormmine; $ createdb -U intermine wormmine-test; $ createdb -U intermine wormmine-items; $ createdb -U intermine unittest; $ createdb -U intermine truncunittest; $ createdb -U intermine fulldataset; $ createdb -U intermine flatmodetest; $ createdb -U intermine notxmltest; $ createdb -U intermine bio-test; $ createdb -U intermine bio-fulldata-test; $ createdb -U intermine wormmine-userprofile; $ sudo su - postgres createuser intermine
Note: will require 5432 open in the security group if access externally is desired.
Install bioseq into Postgres
- Download >= 0.8 http://www.bioinformatics.org/bioseg/wiki/ (annoying PHP download interface)
$ tar xzf bioseg* $ cd release-0.8 $ make USE_PGXS=t clean $ make USE_PGXS=t $ sudo make USE_PGXS=t install
Install bioseg into template1
$ cd /usr/share/pgsql/contrib
See http://intermine.org/wiki/BiosegInstallation for details.
Add a 100 GB data mount at /dev/xvdf
1. Create a new 100 GB volume in the console. 2. In console attach volume at /dev/sdf 3. SSH to the proxy and format the disk > sudo mkfs.ext3 /dev/xvdf 4. Mount the disk cd /usr/local/wormbase/intermine mkdir data chown tharris:wormbase data chmod 2775 data sudo mount /dev/xvdf /usr/local/wormbase/intermine/data
5. Edit fstab
# Primary mount point /dev/xvdf /usr/local/wormbase/intermine/data ext3 noatime 0 0
Relocate the data directory
sudo cp --preserve -r /var/lib/pgsql/data /usr/local/wormbase/intermine/postgres/.
Edit the PGDATA and data_directory variables in /etc/init.d/postgres and /var/lib/pgsql/data/postgres.conf respectively to point to /usr/local/wormbase/intermine/data/postgres/data.
Tomcat
Via the package manager:
$ sudo yum install tomcat6
Or via a stable binary:
$ curl -O http://mirror.csclub.uwaterloo.ca/apache/tomcat/tomcat-6/v6.0.33/bin/apache-tomcat-6.0.33.tar.gz $ tar -zxfv apache-tomcat-6* ; cd apache-tomcat* // startup.sh and shutdown.sh are found in apache-tomcat*/bin/
// Set up users. If installed via the package manager $ sudo emacs /etc/tomcat6/tomcat-users.xml
<tomcat-users> <role rolename="manager"/> <user username="manager" password="manager" roles="manager"/> </tomcat-users>
Edit the port in /etc/tomcat6/server.xml that Tomcat is listening on to something amenable with their architecture (for example 9999).
Getting the source
You'll need both the intermine source code as well as various wormbase "website-intermine" repositories.
> cd /usr/local/wormbase // or whatever > mkdir intermine ; chgrp intermine ; cd intermine > svn co svn://subversion.flymine.org/branches/intermine_0_98 > ln -s intermine_0_98 current
Fetch the WormBase-specific elements of intermine, currently maintained as separate repositories.
// The master directory, containing model and code for fetching/parsing datasets > cd intermine > git clone git@github.com:WormBase/website-intermine-master.git wormmine // sources > cd bio/sources > git clone git://github.com/WormBase/website-intermine-sources.git wormmine
Configure your environment
Copy a suitable starting point for the configuration file that defines database location, username, and password.
$ cd $ mkdir .intermine $ cp /usr/local/wormbase/intermine/current/wormmine/wormine.properties ~/.intermine/.
Or if you're more into symlinks (recommended):
$ cd ~/.intermine $ ln -s /usr/local/wormbase/intermine/current/wormmine/wormmine.properties wormmine.properties
Add some useful flags for ant to your ~/.bash_profile:
export ANT_OPTS="-server -XX:MaxPermSize=256M -Xmx1700m -XX:+UseParallelGC -Xms1700m -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99"
Create Databases
$ cd /usr/local/wormbase/intermine/current/wormmine $ createdb -E SQL_ASCII wormmine $ createdb -E SQL_ASCII wormmine-items
Build the Database
$ cd ${PROJECT_HOME}/wormmine/dbmodel $ ant clean $ ant build-db
Data Sources
Data sources are collected at /usr/local/wormbase/intermine/data, and within that directory in the following structure.
Minimally
/usr/local/wormbase/intermine/data/WSVERSION/SOURCE/ eg /usr/local/wormbase/intermine/data/WS227/uniprot
And for species specific-data
/usr/local/wormbase/intermine/data/WSVERSION/SOURCE/SPECIES/DATA_TYPE eg: /usr/local/wormbase/intermine/data/WS227/wormbase/c_elegans/gff3/
To avoid having to edit/re-edit the "project.xml" file for each new release, we use a symlink to point to the appropriate data for a new build:
$ ls /usr/local/wormbase/intermine/data current -> WS228
Genomic Sequence
Genomic Annotations
Uniprot
./scripts/get_uniprot.pl --release [WSRELEASE]
WormBase specific modifications
The following directories currently contain WormMine-specific modifications that are NOT under version control:
- The entire intermine/wormmine directory. Relocating it and symlinking intermine/wormine -> our/source/repo doesn't work.
- The intermine/bio/sources directory. This *can* be relocated in wormmine/project.xml, but build paths in sources need to be updated, too.
Create some new sources
We'll create a custom source for loading GFF3. We want to fetch and parse it first.
$ cd intermine/current $ ./bio/scripts/make_source wormbase-gff3 gff
Uniprot
Notes about this configuration
postgresql
- We might want to be using PostgreSQL 9
- The postgres user has remote access (don't do this)
- The postgresql server is using the default port - probably not a major concern. While security through obscurity is not a sufficient security implementation by itself, it does not hurt.
- The entire database is accessible to all users remotely (similar to postgres user having remote access)
- The pgdata directory is using the EBS root device, if the instance goes down, the data goes with it (unless you manually back it up). You also can not use that same EBS device on another instance; this makes vertical scaling difficult.
- We are using the 'trust' option in pg_hba.conf. As long as you limit it to socket connections, this can be acceptable, but you should really avoid 'trust' - if you want to use it, at least limit it to particular users and be sure it is only available for socket connections.
- Directory location/size. Using the default database cluster directory is not a terrible idea, but it should really be on a different device, this allows you to more conveniently scale the size of the database cluster. Generally, it is nice to move the pgdata directory off the EBS root device, and also to a unique directory (I like /pgdata) - it keeps mount points clean.
Tips and Tricks
Find the total number of objects in the 'mine
psql -U intermine malariamine psql (8.4.9) Type "help" for help.
wormmine=> select count(*) from intermineobject; wormmine=> select class,count(*) from intermineobject group by class;