Administration:Installing WormMine
How to set up a development instance of WormMine
Contents
- 1 Quick Overview
- 2 Requirements
- 3 Installation / configuration
- 4 Get production database
- 5 Attach to WormBase instance
Quick Overview
- InterMine machines:
- production: 206.108.125.166
- development: 206.108.125.174
- Important directories:
- database backup: /nfs/wormbase/wormmine/database_dumps
- in intermine directory:
- datadir: holds data files
- build_config: config files for build
- redeployment: mine update build
Login information is available in a mine's ~/.intermine/wormmine.properties
file.
Please compile a list of the files whose contents need to be updated when incrementing the version number. For example, "website-intermine/acedb-dev/intermine/wormmine/project.xml" needs to be updated because it contains the string "WS239".
Requirements
Hardware
Linux
- 8 cores
- 24GB RAM
- ~ 1TB storage
Software
Necessary software and versions:
Software | Minimum Version | Purpose |
---|---|---|
Git | 1.7 | check out and update source code |
Java SDK | 6.0 | build and use InterMine |
Ant | 1.8 | invokes the InterMine build |
Tomcat | 6.0.29 | website |
PostgreSQL | 8.3 | database |
Perl | 5.8.8 | run build scripts |
Installation / configuration
Dependencies
Git
Install the command line tool:
$ sudo apt-get install git-core
Configure your user and email:
$ git config --global user.name "Name Surname"
$ git config --global user.email "your.email@gmail.com"
Java
Download here.
Since InterMine can be memory intensive, it's helpful to pass environment variables to ant through the ANT_OPTS
variable.
$ export ANT_OPTS="-server -XX:MaxPermSize=256M -Xmx1700m -XX:+UseParallelGC
-Xms1700m -XX:SoftRefLRUPolicyMSPerMB=1 -XX:MaxHeapFreeRatio=99"
Ant
Refer to ant's manual for installation instructions.
Tomcat
Refer to Tomcat InterMine installation
Post-Installation Environment
export CATALINA_HOME=/YOURPATH/apache-tomcat-6.0.36
Configuring an Alternative HTTP Port
vim apache-tomcat-6.0.36/conf/server.xml
Replace the port number in this context:
<Service name="Catalina"> <Connector port="YOURPORT" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8" /> ...
For user-specific port numbers see the Google Doc "Developer Resources at WormBase".
Starting Tomcat
$CATALINA_HOME/bin/startup.sh
Stopping Tomcat
$CATALINA_HOME/bin/shutdown.sh
PostgreSQL
Refer to InterMine PostgreSQL installation guide
Perl
Refer to InterMine Perl installation guide
Download and Install WormMine
Navigate into the folder you want to install WormMine
Download code from Git:
~]$ git clone https://github.com/WormBase/website-intermine.git Cloning into website-intermine... remote: Counting objects: 2866, done. remote: Compressing objects: 100% (1054/1054), done. remote: Total 2866 (delta 1728), reused 2837 (delta 1699) Receiving objects: 100% (2866/2866), 25.28 MiB | 4.65 MiB/s, done. Resolving deltas: 100% (1728/1728), done.
This downloads the entire intermine project repository. The mine itself is a submodule of this.
Explain the relationship between the master, unmerged and dev branches of the "intermine" repository.
The "dev" branch of the "intermine" repository is ahead of "unmerged". Which modifications have been made there?
Initialize submodule
~]$ cd website-intermine website-intermine]$ git submodule update --init Submodule 'acedb-dev/intermine' (git@github.com:WormBase/intermine.git) registered for path 'acedb-dev/intermine' Cloning into acedb-dev/intermine... Warning: Permanently added the RSA host key for IP address '192.30.252.131' to the list of known hosts. remote: Counting objects: 379065, done. remote: Compressing objects: 100% (78659/78659), done. remote: Total 379065 (delta 233912), reused 377610 (delta 232792) Receiving objects: 100% (379065/379065), 685.14 MiB | 6.47 MiB/s, done. Resolving deltas: 100% (233912/233912), done. Submodule path 'acedb-dev/intermine': checked out 'd640534eda614d60558c6561da6fb9311d6ad893'
This populates the intermine directory at website-intermine/acedb-dev/intermine It needs to be set to the proper branch.
Navigate to the mine:
website-intermine]$ cd acedb-dev/intermine/
Working with the Development Branch
Get the "website-intermine" repo as before, but do not init the submodule. Instead proceed as follows:
git clone https://github.com/WormBase/website-intermine.git git clone https://github.com/WormBase/intermine # Create link to the InterMine sources within the web-site context. Replaces sub-module. cd website-intermine/acedb-dev rmdir intermine ln -s ../../intermine . cd ../../intermine # Get development branch. git remote add unmerged https://github.com/WormBase/intermine/tree/unmerged git checkout unmerged
Create properties file
- Create ~/.intermine directory.
- Copy the sample properties file as
~/.intermine/wormmine.properties
- Fill in placeholders as follows:
- <POSTGRES USER PASSWORD>: postgres password for intermine user
- <TOMCAT USER PASSWORD>: tomcat password for intermine user
- <SERVER PUBLIC BASE URL>: Base url of your web server, including port. Sample: http://123.456.789.123:8080
- <CREATE WM ADMIN USERNAME/PASSWORD> create the primary admin account
- <EMAIL ADDRESS TO SEND HELP EMAILS FROM>: your server should be configured to send emails from this address. This will send users password reset emails and the like.
- <HELP REQUESTS ARE SENT HERE>: can be same address as above, this is where input from the InterMine help form gets sent.
Get production database
Build a new production database
I remember from your talk that there is a Java class (or more?) that are specific to WormMine. Please document those: describing what they are good for, how they work, particular quirks that had to be coded (possibly hardcoded).
Please comment/document https://github.com/WormBase/intermine/blob/unmerged/bio/sources/wormbase-acedb/main/src/org/intermine/bio/dataconversion/WormbaseAcedbConverter.java since it is the most likely file that might need touching when the AceDB model changes again.
The build requires:
- Fasta
- GFF3
- GO ontology
- GO association
- Ace XML files
All of these are retrieved from the FTP site, except for the Ace XML files.
Generate data files
These must be created on a machine with a tace instance, and stored in an accessible location.
Generate Ace XML files
Both the blanket dump and manual queries must be run and saved to XML to represent all covered types
Blanket dump
On machine with Ace instance.
- Download the website-intermine repository as described above
- Navigate to acedb-dev/acedb
website-intermine]$ cd acedb-dev/acedb/
imdump.sh
is a shell script which generates XML files for each species in model
, into the supplied destination directory. The intermine machine must have access to the directory.
Run it with: any folder can be used as the Ace XML dump location
> ./imdump.sh <ACE XML DUMP>
- It requires the Ace DB database location, is none is supplied through the $ACEDB environment variable, it will search
/usr
fortace
and use<it's grandparent directory>/wormbase
/usr/local/wormbase/wormmine/xmldumps/
[19:20|jdmswong@ip-10-35-66-254|acedb]$ ./imdump.sh /usr/local/wormbase/wormmine/xmldumps/
Did not specify AceDB dir in $ACEDB. Searching for tace...
ACEDB set to: /usr/local/wormbase/acedb/wormbase
Species
... done.
Gene
... done.
... and so on
If the directory passed into imdump.sh contains a trailing slash, script will not function correctly.
If XML files for each type in models has been created, then the script has executed successfully. The beginning and ends of each file may be checked manually to be sure.
If the error displayed in the output below is encountered, Ace is probably being disrupted and the command should be re-ran
jbaran@ip-10-35-66-254:~/src/website-intermine/acedb-dev/acedb$ ./imdump.sh /usr/local/wormbase/website/jbaran/acedbdump Did not specify AceDB dir in $ACEDB. Searching for tace... ACEDB set to: /usr/local/wormbase/acedb/wormbase Species ... done. Gene ... done. Transcript ... done. CDS ... done. Variation ./imdump.sh: line 40: 19068 Killed $ACEDB_BIN/tace "$ACEDB" > /dev/null <<EOF wb find ${model} show -x -f "$dumpdir/$model.xml" EOF ... done.
Manual queries
Not all records are desired for some types. In these cases specialized ace queries much be run, a step which is not yet automated.
Commands to generate WS239 build, in website-intermine/acedb-dev/acedb
directory:
tace "/usr/local/wormbase/acedb/wormbase" acedb> query find Gene Live acedb> show -x -f <ACE XML DUMP>/Gene.xml acedb> query find Protein Corresponding_CDS acedb> show -x -f <ACE XML DUMP>/Protein.xml acedb> query find CDS Method="curated" acedb> show -x -f <ACE XML DUMP>/CDS.xml acedb> query find Transcript (Gene) acedb> show -x -f <ACE XML DUMP>/Transcript.xml acedb> KeySet-Read species.ace acedb> show -x -f <ACE XML DUMP>/Species.xml
The files generated will reflect the ace queries used to generate them. All species are loaded unless otherwise specified.
Each query in website-intermine/acedb-dev/acedb/manual_queries.txt
must be run individually in tace, followed by show -x -f <TYPE>.xml
where <TYPE>
is the ace type being queried.
These queries represent desirable subsets of those represented types.
Acquire and pre-process data files
Copy the Ace XML over from the shared directory they were saved to from the machine which generated them or use:
rsync <HOST IP>:<HOST MACHINE FILE LOCATION> <TARGET MACHINE IP>:<TARGET MACHINE PATH>
Acquires data files from their appropriate data sources, and pre-processes each one accordingly.
On the InterMine machine, in the intermine directory:
- Navigate to the redeployment folder.
intermine]$ cd redeployment/
- The
update.properties
file should contain these two entries:
release = WS239 ace-xml-dir = /nfs/wormbase/wormmine/acedb_dumps/${release}
The release
is used to generate strings, and must match the format of the FTP site.
ace-xml-dir
is where the build looks for the ace xml files generated above. This has to be set to <TARGET MACHINE PATH>
from above.
The build downloads and processes fasta, gff3, go, gaf files. In addition to copying and processing the Ace XML files.
Build configuration
Property | function |
---|---|
datadir | The data directory for WormMine |
release | WormBase release version to use for paths and filenames |
backup-dirname | directory to backup old data directory too. |
genomic-fasta-species-file | species to download and/or process genomic fasta for |
protein-fasta-species-file | species to download and/or process protein fasta for |
gff3-species-file | same, for gff3s |
ace-classes file | ace classes to copy and/or process |
ant -p
will display all invokable tasks, available for individual execution.
- Run the build
This will backup the old data directory into backup-dirname
, delete it, then download and process all file types.
redeployment]$ ant
To only download and process:
redeployment]$ ant run-all
Size of datadir directories for WS239 build:
[12:24|jdmswong@wb-intermine-prod|datadir]$ du -sh * 2.7G fasta 28M go 28M go-association 32G wormbase-acedb 34G wormbase-gff3
get-all-ace-xml: BUILD FAILED /home/jbaran/src/intermine/redeployment/build.xml:311: /home/jbaran/src/intermine/redeployment/ace_classes-test.txt doesn't exist Total time: 85 minutes 15 seconds
Troubleshooting
Malformed JSON string: output produced by "ant run-all"
website-intermine/acedb-dev/intermine/redeployment$ ant run-all Buildfile: /home/yourusername/src/intermine/redeployment/build.xml get-assembly-ids: [exec] malformed JSON string, neither array, object, number, string or atom, at character offset 8424 (before "],\n "full_name..."JSON retrieved from ftp://ftp.wormbase.org/pub/wormbase/releases/WS239/species/ASSEMBLIES.WS239.json [exec] ) at gen_assemblies.pl line 19. BUILD FAILED /home/yourusername/src/intermine/redeployment/build.xml:39: exec returned: 255 Total time: 2 seconds
Fix: Contact Kevin Howe so that the JSON file on the FTP server can be updated. JSON formatting errors occur sometimes when the file is manually edited after its generation.
Build the database
- From the intermine directory, navigate to
wormmine
idev]$ cd wormmine
- Run build
idev]$ ../bio/scripts/project_build -b -v localhost wormmine_dump
This will run all sources configured in intermine/wormmine/project.xml
file.
To learn more about the project.xml file, refer to the official documentation
Any rare issues encountered can be addressed by the InterMine developer list at dev (AT) intermine.org
Build Steps (Preliminary Info)
Building the database refers to the process in which the project build script compiles each source together into a production database.
1. Build DB
Invoked by the -b switch to project_build.
Runs: cd dbmodel ; ant clean build-db
This generates the primary mine model file, and creates a fresh database with the desired schema. To generate the model it merged the model additions for all source types used in the main project.xml file. Ace XML, Gff3, and Fasta sources are the most essential.
Restarting Building from Milestones/Checkpoints
If a build error is encountered, it will appear in the output of the build command as "BUILD FAILED". To restart from the latest checkpoint, run the project build script with the -b (build) flag replaced with -l (recover)
../bio/scripts/project_build -v -l localhost wormmine_dump
This will, instead of rebuilding, attempt to restart by reading the last dump database. Dump databases are created by the build script from sources through specifications in the project.xml through the "dump" attribute. An example:
<source name="wb-acedb-Variation" type="wormbase-acedb" dump="true">
properties go here ...
</source>
The builder will create the <DATABASE NAME>:wb-acedb-Variation
database once this source is run. If restarting, the builder will find the most recent of these backup databases, clone it, and resume from there.
About the database
The database name is configured in the properties file as: db.production.datasource.databaseName
. Each table represents a class in the model, with additional ones representing many-to-many collections, and various metadata. The InterMine development team does not currently advise for developers to modify the backend database due to many layers of inheritance, although questions may be directed to the InterMine developer list at dev (AT) intermine.org.
Instantiate database dump
To instantiate a previously built WormMine production database.
- Find your favorite release from WORMMINE DB FTP URL (placeholder, no URL exists)
- Create empty DB
> createdb -U intermine -E SQL_ASCII wormmine
- -U: user set to intermine
- -E: character set used
- Unpack and restore DB
> psql -U intermine -d wormmine -f <WORMMINE RELEASE SQL>
- -U: execute as user
- -d: destination DB
- -f: SQL input file
Copy an existing database
Creates a new database with the contents of an already existing database:
createdb -U intermine -W -T EXISTINGDBNAME NEWDBNAME
Note: the existing database needs to be owned by the "intermine" PostgreSQL user. If that is not the case, then it can be set using the following SQL command:
ALTER DATABASE EXISTINGDBNAME OWNER TO intermine
Migrating to production
The production database must be moved to the production server to be served by the webapp.
Any paths used in the following commands are arbitrary as long as they are consistent.
Some databases are named along the lines of wormmine-ws239-2. Database names are arbitrary as long as they are referred to correctly in the mine properties file.
Dumping database
On the development machine:
pg_dump -U intermine <DATABASE NAME> -f <PATH TO DUMPFILE>
This created the database dump file.
Transfer this dumpfile to the production machine. Scp is one of the many options: On development machine:
scp <PATH TO DUMPFILE> <REMOTE MACHINE IP>:<REMOTE DUMPFILE PATH>
Restoring database
On production machine:
createdb <DESTINATION DATABASE NAME> psql -U intermine -d <DESTINATION DATABASE NAME> -f <REMOTE DUMPFILE PATH>
The database is now instantiated on the production machine
Configuring mine to database
On production machine, in
~/.intermine/wormmine.properties
, set
db.production.datasource.databaseName=<DESTINATION DATABASE NAME>
Create userprofile database
InterMine needs a separate database to track users and their information.
This can be skipped if an existing userprofile database is present.
- Create empty DB
> createdb -U intermine -E SQL_ASCII userprofile-wormmine
- Build the userprofile DB
> cd wormmine/webapp > ant build-db-userprofile
This formats the empty userprofile database for mine use.
About the userprofile database
The database name is set in the properties file as db.userprofile-production.datasource.databaseName
. User information is stored in the userprofile
table. Tables that begin with "saved" map users to any data they have saved; such as lists, queries, templates, and so on. List data mapping is stores in bagvalues
Launch webapp
- Navigate to intermine/wormmine/webapp
- Launch webapp:
> ./xx
This script contains:
ant clean ant -v default remove-webapp release-webapp
Which may be run in sequence instead. These commands clear previous webapp files, remove any existing webapps which may be launched, and compile and release a new webapp.
Test Webapp
You should be able to reach your new instance through <baseurl>/wormmine Webapp is standalone.
About the webapp
Webapp maintenance is fairly simple. Packaged monitoring services are not provided, and logs are stored in intermine/wormmine/webapp/intermine.log
and <$CATALINA_HOME>/logs
. Any problems which arise may be handled by rebooting the web application:
$CATALINA_HOME/bin/startup.sh $CATALINA_HOME/bin/shutdown.sh
Attach to WormBase instance
If you want to enable integration with WormBase, follow these steps:
Checkout merged branch
> git checkout remotes/origin/staging Note: checking out 'remotes/origin/staging'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b new_branch_name HEAD is now at 611b791... new tests, shell script to run tests > git checkout -b staging Switched to a new branch 'staging'
Reconfigure properties file
- Needs to deploy at tools/wormmine
webapp.path=tools/wormmine
- Change url to where the base will be:
webapp.baseurl=http://staging.wormbase.org webapp.returnurl=http://staging.wormbase.org/auth/openid?openid_identifier=https://www.google.com/accounts/o8/id&redirect=http://dev.wormbase.org/tools/wormmine/mymine.do#
Modify wormbase.conf
To enable login system, make sure config flag: wormmine_path = 'tools/wormmine'
is uncommented.