Difference between revisions of "Administration:WormBase Production Environment"
(37 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | DEPRECATED! | ||
+ | |||
+ | |||
+ | |||
=Overview= | =Overview= | ||
Line 28: | Line 32: | ||
* How/where is the back end node hosting the user database specified? | * How/where is the back end node hosting the user database specified? | ||
* differences in configuration files. | * differences in configuration files. | ||
− | |||
− | |||
nginx | nginx | ||
Line 52: | Line 54: | ||
'''1. Install prerequisites''' | '''1. Install prerequisites''' | ||
− | # Perl Compatabile Regular Expression | + | # Perl Compatabile Regular Expression library, or for EC2, yum |
sudo apt-get install libpcre3 libpcre3-dev libssl-dev libc6-dev | sudo apt-get install libpcre3 libpcre3-dev libssl-dev libc6-dev | ||
Line 61: | Line 63: | ||
'''2. Get the nginx cache-purge module''' | '''2. Get the nginx cache-purge module''' | ||
cd src/ | cd src/ | ||
− | curl -O http://labs.frickle.com/files/ngx_cache_purge-1. | + | curl -O http://labs.frickle.com/files/ngx_cache_purge-1.3.tar.gz |
− | tar xzf ngx_cache_purge-1. | + | tar xzf ngx_cache_purge-1.3.tar.gz |
'''3. Build and install nginx''' | '''3. Build and install nginx''' | ||
− | curl -O http://nginx.org/download/nginx-1.0. | + | curl -O http://nginx.org/download/nginx-1.0.14.tar.gz |
tar xzf nginx* | tar xzf nginx* | ||
cd nginx* | cd nginx* | ||
./configure \ | ./configure \ | ||
− | --prefix=/usr/local/wormbase | + | --prefix=/usr/local/wormbase/services/nginx-1.0.14 \ |
--error-log-path=/usr/local/wormbase/logs/nginx-error.log \ | --error-log-path=/usr/local/wormbase/logs/nginx-error.log \ | ||
--http-log-path=/usr/local/wormbase/logs/nginx-access.log \ | --http-log-path=/usr/local/wormbase/logs/nginx-access.log \ | ||
Line 84: | Line 86: | ||
--with-http_secure_link_module \ | --with-http_secure_link_module \ | ||
--with-openssl=../openssl-0.9.8p \ | --with-openssl=../openssl-0.9.8p \ | ||
− | --add-module=../ngx_cache_purge-1. | + | --add-module=../ngx_cache_purge-1.3 |
make | make | ||
make install | make install | ||
cd /usr/local/wormbase/services | cd /usr/local/wormbase/services | ||
− | ln -s nginx-1.0. | + | ln -s nginx-1.0.14 nginx |
cd /usr/local/wormbase/services/nginx | cd /usr/local/wormbase/services/nginx | ||
mv conf conf.original | mv conf conf.original | ||
− | ln -s /usr/local/wormbase/admin/ | + | ln -s /usr/local/wormbase/website-admin/nginx/production conf |
'''4. Test the configuration file syntax by:''' | '''4. Test the configuration file syntax by:''' | ||
Line 135: | Line 137: | ||
Update Nginx configuration to point to the newly signed certificate and private key. | Update Nginx configuration to point to the newly signed certificate and private key. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==== Set nginx to start at server launch ==== | ==== Set nginx to start at server launch ==== | ||
+ | sudo cp /usr/local/wormbase/website-admin/nginx/conf/production/nginx.init /etc/init.d/nginx | ||
sudo /usr/sbin/update-rc.d -f nginx defaults | sudo /usr/sbin/update-rc.d -f nginx defaults | ||
Line 156: | Line 153: | ||
/etc/rc4.d/S20nginx -> ../init.d/nginx | /etc/rc4.d/S20nginx -> ../init.d/nginx | ||
/etc/rc5.d/S20nginx -> ../init.d/nginx | /etc/rc5.d/S20nginx -> ../init.d/nginx | ||
+ | |||
+ | |||
+ | ==== Set up nginx log roation ==== | ||
+ | |||
+ | 30 1 * * * /usr/local/wormbase/website-admin/log_analysis/rotate_nginx_logs.pl | ||
+ | |||
+ | ==== Starting the Server ==== | ||
+ | |||
+ | sudo /etc/init.d/nginx restart | ||
=== memcached === | === memcached === | ||
Line 293: | Line 299: | ||
− | === Hadoop Distributed File System === | + | === Hadoop Distributed File System (HDFS+Hoop) === |
We use the Hadoop Distributed File System to make it easier and faster to move files around. | We use the Hadoop Distributed File System to make it easier and faster to move files around. | ||
Line 368: | Line 374: | ||
export PATH=$PATH:$HADOOP_HOME/bin | export PATH=$PATH:$HADOOP_HOME/bin | ||
</pre> | </pre> | ||
+ | |||
+ | Create an ssh key: | ||
+ | ssh-keygen -t rsa -P "" | ||
+ | |||
+ | Let the hadoop user connect to localhost: | ||
+ | cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys | ||
+ | chmod 600 $HOST/.ssh/authorized_keys | ||
+ | |||
+ | Test | ||
+ | ssh localhost | ||
+ | |||
+ | |||
+ | Set up hadoop | ||
+ | /usr/local/wormbase/shared/services/hadoop/conf/hadoop-env.sh | ||
+ | Set JAVA_HOME to | ||
+ | JAVA_HOME=/usr/lib/jvm/default-java | ||
+ | |||
+ | Configure directory where hadoop stores its files: | ||
+ | conf/core-site.xml | ||
+ | <pre> | ||
+ | <!-- In: conf/core-site.xml --> | ||
+ | <property> | ||
+ | <name>hadoop.tmp.dir</name> | ||
+ | <value>/app/hadoop/tmp</value> | ||
+ | <description>A base for other temporary directories.</description> | ||
+ | </property> | ||
+ | |||
+ | <property> | ||
+ | <name>fs.default.name</name> | ||
+ | <value>hdfs://localhost:54310</value> | ||
+ | <description>The name of the default file system. A URI whose | ||
+ | scheme and authority determine the FileSystem implementation. The | ||
+ | uri's scheme determines the config property (fs.SCHEME.impl) naming | ||
+ | the FileSystem implementation class. The uri's authority is used to | ||
+ | determine the host, port, etc. for a filesystem.</description> | ||
+ | </property> | ||
+ | </pre> | ||
+ | |||
+ | Create the temporary direcotry: | ||
+ | |||
+ | $ sudo mkdir -p /usr/local/hdfs/tmp | ||
+ | $ sudo chown hduser:hadoop /usr/local/hdfs/tmp | ||
+ | # ...and if you want to tighten up security, chmod from 755 to 750... | ||
+ | $ sudo chmod 750 /usr/local/hdfs/tmp | ||
+ | |||
+ | In conf/mapred.xml | ||
+ | |||
+ | <pre> | ||
+ | <!-- In: conf/mapred-site.xml --> | ||
+ | <property> | ||
+ | <name>mapred.job.tracker</name> | ||
+ | <value>localhost:54311</value> | ||
+ | <description>The host and port that the MapReduce job tracker runs | ||
+ | at. If "local", then jobs are run in-process as a single map | ||
+ | and reduce task. | ||
+ | </description> | ||
+ | </property> | ||
+ | </pre> | ||
+ | |||
+ | In conf/hdfs-site.xml | ||
+ | <pre> | ||
+ | <!-- In: conf/hdfs-site.xml --> | ||
+ | <property> | ||
+ | <name>dfs.replication</name> | ||
+ | <value>1</value> | ||
+ | <description>Default block replication. | ||
+ | The actual number of replications can be specified when the file is created. | ||
+ | The default is used if replication is not specified in create time. | ||
+ | </description> | ||
+ | </property> | ||
+ | </pre> | ||
+ | |||
+ | Format the namenode | ||
+ | |||
+ | hduser@ubuntu:~$ /usr/local/wormbase/services/hadoop/bin/hadoop namenode -format | ||
+ | |||
+ | === HBase === | ||
+ | |||
+ | curl -O http://www.reverse.net/pub/apache//hbase/stable/hbase-0.90.3.tar.gz | ||
+ | |||
+ | |||
+ | === CouchDB === | ||
+ | |||
+ | We use the document store CouchDB to store and replicate pregenerated HTML and other content across web server nodes. | ||
+ | |||
+ | See the [http://guide.couchdb.org/editions/1/en/index.html CouchDB manual] for a good introduction. | ||
+ | |||
+ | This [http://stackoverflow.com/questions/1911226/using-couchdb-to-serve-html post on StackOverflow] discusses serving HTML via couch, too. | ||
+ | |||
+ | Comparing CouchDB and MongoDB? See [http://www.slideshare.net/gabriele.lana/couchdb-vs-mongodb-2982288 this SlideShare presentation] and [http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB this post] on the MongoDB site. | ||
+ | |||
+ | ==== Installation ==== | ||
+ | |||
+ | Install dependencies | ||
+ | |||
+ | sudo apt-get install erlang libicu-dev libmozjs-dev libcurl4-openssl-dev | ||
+ | |||
+ | Build: | ||
+ | |||
+ | cd src | ||
+ | curl -O http://www.apache.org/dyn/closer.cgi?path=/couchdb/1.1.0/apache-couchdb-1.1.0.tar.gz | ||
+ | tar xzf apache-* | ||
+ | cd apache-* | ||
+ | ./configure | ||
+ | ./make && sudo make install | ||
+ | |||
+ | Create a couchdb user if one doesn't already exist | ||
+ | |||
+ | adduser --system \ | ||
+ | --home /usr/local/var/lib/couchdb \ | ||
+ | --no-create-home \ | ||
+ | --shell /bin/bash \ | ||
+ | --group --gecos \ | ||
+ | "CouchDB Administrator" couchdb | ||
+ | |||
+ | Fix permissions | ||
+ | chown -R couchdb:couchdb /usr/local/etc/couchdb | ||
+ | chown -R couchdb:couchdb /usr/local/var/lib/couchdb | ||
+ | chown -R couchdb:couchdb /usr/local/var/log/couchdb | ||
+ | chown -R couchdb:couchdb /usr/local/var/run/couchdb | ||
+ | chmod 0770 /usr/local/etc/couchdb | ||
+ | chmod 0770 /usr/local/var/lib/couchdb | ||
+ | chmod 0770 /usr/local/var/log/couchdb | ||
+ | chmod 0770 /usr/local/var/run/couchdb | ||
+ | |||
+ | Set up Couchdb to start up automatically. | ||
+ | cp /usr/local/etc/init.d/couchdb /etc/init.d/couchdb | ||
+ | sudo update-rc.d couchdb defaults | ||
+ | |||
+ | Edit the defaults file (/usr/local/etc/couchdb/default.ini) to allow more open permissions | ||
+ | bin_address 0.0.0.0 | ||
+ | |||
+ | Start it up | ||
+ | sudo /etc/init.d/couchdb start | ||
+ | |||
+ | Test it out | ||
+ | curl http://127.0.0.1:5984/ | ||
+ | |||
+ | Get a list of databases | ||
+ | curl -X GET http://127.0.0.1:5984/_all_dbs | ||
+ | |||
+ | Create a database for each new release of WormBase | ||
+ | curl -X PUT http://127.0.0.1:5984/wsXXX # We'll create databases for each WSXXX version | ||
+ | |||
+ | Deleting databases | ||
+ | curl -X DELETE http://127.0.0.1:5984/wsXXX | ||
+ | |||
+ | Create a document | ||
+ | curl -X PUT http://127.0.0.1:5984/wsXXX/UUID | ||
+ | |||
+ | Get a document | ||
+ | curl -X GET http://1270.0.0.1:5984/wsXXX/UUID | ||
+ | |||
+ | Create a document with an attachment | ||
+ | curl -X PUT http://127.0.0.1:5984/wsXXX/UUID/attachment | ||
+ | -d @/usr/local/wormbase/databases/WS226/cache/gene/overview/WBGene00006763.html -H "Content-Type: text/html" | ||
+ | |||
+ | Get the document's attachment directly. | ||
+ | curl -X GET http://127.0.0.1:5984/wsXXX/UUID/attachment | ||
+ | |||
+ | ==== Securing the database ==== | ||
+ | Add an admin user and password to : | ||
+ | |||
+ | /usr/local/etc/couchdb/local.ini | ||
+ | |||
+ | The password will be hashed automatically. | ||
+ | |||
+ | ==== Populating the database ==== | ||
+ | |||
+ | During the staging process, the precache_widgets.pl script creates then populates a | ||
+ | CouchDB instance on the development server. See that script and its related modules for examples. | ||
+ | |||
+ | In the future, we might also want to explore bulk inserts and [http://guide.couchdb.org/editions/1/en/performance.html performance tuning]. | ||
+ | |||
+ | ==== Querying the database ==== | ||
+ | |||
+ | Get some general information about a database | ||
+ | curl -X GET http://127.0.0.1:5984/wsxxx/ | ||
+ | |||
+ | ==== Replication ==== | ||
+ | |||
+ | See the [http://guide.couchdb.org/draft/replication.html guide to replication] in the CouchDB book. | ||
+ | |||
+ | Replication is handled by the production/steps/replicate_couchdb.pl script. | ||
+ | |||
+ | ==== Configuration ==== | ||
+ | |||
+ | Edit /etc/couchdb/default.ini to the following. | ||
+ | |||
+ | ==== The Futon Management interface ==== | ||
+ | |||
+ | ==== Couchapp ==== | ||
+ | |||
+ | https://github.com/couchapp/couchapp | ||
+ | |||
+ | |||
+ | Measuring performance: | ||
+ | http://till.klampaeckel.de/blog/archives/16-measuring-couchdb-performance.html | ||
== Data mining nodes == | == Data mining nodes == | ||
Line 782: | Line 986: | ||
Add the appropriate '''Listen PORT''' directive. | Add the appropriate '''Listen PORT''' directive. | ||
+ | |||
+ | [[Category: Legacy Architecture (Web Dev)]] |
Latest revision as of 20:48, 18 June 2014
DEPRECATED!
Contents
- 1 Overview
- 2 To resolve
- 3 Nodes
- 3.1 Reverse Proxy Node
- 3.2 Webserver Nodes
- 3.3 Data mining nodes
- 3.4 WormMart node
- 3.5 Social feature node
- 4 NFS
- 5 Miscellaneous
- 6 See Also
- 7 Monitoring
- 8 Apache
Overview
The WormBase production environment consists of a series of partially redundant web and database servers, most sitting behind a load-balancing reverse-proxy server running nginx. This document describes the basic setup and configuration of this environment.
Reverse proxy node
- Two servers each running nginx as a load-balancing reverse proxy. Built in memcached support establishes a memory cache amongst all back end web server nodes. Requests are distributed in round-robin fashion.
nginx acedb starman_webapp mysql userdb
Web server nodes
- Each web cluster node runs the lightweight HTTP server starman listening on port 5000. This http server is glued via PSGI/Plack/Starman to our Catalyst web application.
- Currently, each node is -- with the exception of GBrowse -- almost entirely independent, with its own AceDB and MySQL databases.
- Web cluster nodes are accessible ONLY to the front end proxy.
Data mining nodes
Social feature node
To resolve
- How/where is the back end node hosting the user database specified?
- differences in configuration files.
nginx
- ssl
- proxy caching
- to test
- logging in
- browser compatibility
- set automatic updates of code and restarting of services
Nodes
Reverse Proxy Node
nginx
Installation
We'll place nginx entirely within the wormbase root directory. It's configuration and init files are maintained in the wormbase-admin module.
1. Install prerequisites
# Perl Compatabile Regular Expression library, or for EC2, yum sudo apt-get install libpcre3 libpcre3-dev libssl-dev libc6-dev
# Fetch and unpack openssl wget http://www.openssl.org/source/openssl-0.9.8p.tar.gz tar -zxf openssl-0.9.8p.tar.gz
2. Get the nginx cache-purge module
cd src/ curl -O http://labs.frickle.com/files/ngx_cache_purge-1.3.tar.gz tar xzf ngx_cache_purge-1.3.tar.gz
3. Build and install nginx
curl -O http://nginx.org/download/nginx-1.0.14.tar.gz tar xzf nginx* cd nginx* ./configure \ --prefix=/usr/local/wormbase/services/nginx-1.0.14 \ --error-log-path=/usr/local/wormbase/logs/nginx-error.log \ --http-log-path=/usr/local/wormbase/logs/nginx-access.log \ --with-http_stub_status_module \ --with-http_ssl_module \ --with-ipv6 \ --with-http_realip_module \ --with-http_addition_module \ --with-http_image_filter_module \ --with-http_sub_module \ --with-http_dav_module \ --with-http_flv_module \ --with-http_gzip_static_module \ --with-http_secure_link_module \ --with-openssl=../openssl-0.9.8p \ --add-module=../ngx_cache_purge-1.3 make make install cd /usr/local/wormbase/services ln -s nginx-1.0.14 nginx
cd /usr/local/wormbase/services/nginx mv conf conf.original ln -s /usr/local/wormbase/website-admin/nginx/production conf
4. Test the configuration file syntax by:
$ nginx -t
Here's a more complicated example demonstrating caching and load balancing: http://nathanvangheem.com/news/nginx-with-built-in-load-balancing-and-caching
About Load Balancing
nginx relies on the NginxHttpUpstreamModule for load balancing. It's built-in by default. The documentation contains a number of possibly useful configuration directives:
http://wiki.nginx.org/NginxHttpUpstreamModule
There are a number of other interesting load-balancing modules that might be of use:
http://wiki.nginx.org/3rdPartyModules
5. Generate SSL certificates
To generate private (dummy) certificates you can perform the following list of openssl commands.
First change directory to where you want to create the certificate and private key:
$ cd /usr/local/wormbase/shared/services/nginx/conf
Now create the server private key, you'll be asked for a passphrase: $ openssl genrsa -des3 -out server.key 1024
Create the Certificate Signing Request (CSR): $ openssl req -new -key server.key -out server.csr
Remove the necessity of entering a passphrase for starting up nginx with SSL using the above private key:
$ cp server.key server.key.org $ openssl rsa -in server.key.org -out server.key
Finally sign the certificate using the above private key and CSR: $ openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt
Update Nginx configuration to point to the newly signed certificate and private key.
Set nginx to start at server launch
sudo cp /usr/local/wormbase/website-admin/nginx/conf/production/nginx.init /etc/init.d/nginx sudo /usr/sbin/update-rc.d -f nginx defaults
The output will be similar to this:
Adding system startup for /etc/init.d/nginx ... /etc/rc0.d/K20nginx -> ../init.d/nginx /etc/rc1.d/K20nginx -> ../init.d/nginx /etc/rc6.d/K20nginx -> ../init.d/nginx /etc/rc2.d/S20nginx -> ../init.d/nginx /etc/rc3.d/S20nginx -> ../init.d/nginx /etc/rc4.d/S20nginx -> ../init.d/nginx /etc/rc5.d/S20nginx -> ../init.d/nginx
Set up nginx log roation
30 1 * * * /usr/local/wormbase/website-admin/log_analysis/rotate_nginx_logs.pl
Starting the Server
sudo /etc/init.d/nginx restart
memcached
Install
$ sudo apt-get install memcached
Configure
Make memcached listen to all IP addresses, not just requests from localhost:'
$ emacs /etc/memcached.conf -l 127.0.0.1 <--- comment out this line
Lock down access to memcached via iptables as described below.
memcached installed memcached configed iptables config web1 web2 configured web3 web4 mining
Configure iptables
################################################## # New architecture # # Proxy nodes: # service port accessibility # nginx 2011, later on 80 all # memcached 11211 cluster # starman 5000 localhost # mysql 3306 cluster, dev # # Backend nodes # service port accessibility # starman 5000 proxy nodes # memcached 11211 cluster # ##################################################
# Proxy # The new website front-end proxy, accessible to the world $BIN -A INPUT -p tcp --dport 2011 -m state --state NEW -j ACCEPT # Open MySQL to other production nodes; need access to the sessions database $BIN -A INPUT -p tcp --dport 3306 -m iprange --src-range 206.108.125.168-206.108.125.190 -j ACCEPT # Open memcached $BIN -A INPUT -p tcp --dport 11211 -m iprange --src-range 206.108.125.168-206.108.125.190 -j ACCEPT
# Backend machines # starman $BIN -A INPUT -p tcp --dport 5000 -m iprange --src-range 206.108.125.168-206.108.125.190 -j ACCEPT # memcached $BIN -A INPUT -p tcp --dport 11211 -m iprange --src-range 206.108.125.168-206.108.125.190 -j ACCEPT
# Let me access backend services directly for debugging # Starman, port 5000 $BIN -A INPUT -p tcp -s 206.228.142.230 --dport 5000 -m state --state NEW -j ACCEPT # Old site httpd, port 8080 $BIN -A INPUT -p tcp -s 206.228.142.230 --dport 8080 -m state --state NEW -j ACCEPT # memcached $BIN -A INPUT -p tcp -s 206.228.142.230 --dport 11211 -m state --state NEW -j ACCEPT
Then
/etc/init.d/iptables.local restart
Launch services on the front end machine
# nginx $ sudo /etc/init.d/nginx restart
# memcached $ sudo /etc/init.d/memcached restart
# starman cd /usr/local/wormbase/website/production/bin $ ./starman-production.sh start
Webserver Nodes
Webserver nodes mount an NFS share containing (almost) everything they need.
A webserver node expects the following layout:
/usr/local/wormbase /usr/local/wormbase/acedb /usr/local/wormbase/databases /usr/local/wormbase/extlib /usr/local/wormbase/services /usr/local/wormbase/website/production /usr/local/wormbase/website-shared-files
An NFS share provides most of these elements, mounted at /usr/local/wormbase/shared, with symlinks as:
/usr/local/wormbase
/usr/local/wormbase/acedb /usr/local/wormbase/databases -> shared/databases /usr/local/wormbase/extlib -> shared/extlib /usr/local/wormbase/services -> shared/services /usr/local/wormbase/website/production -> shared/website/production /usr/local/wormbase/website-shared-files -> shared/website-shared-files
Each webserver node hosts its own AceDB database and server, its own mysql database
Individual webserver nodes should be configured essentially as described in the Installing WormBase documentation, except that they do not require nginx.
HTTP server: PSGI/Plack + Starman
See Starman: the lightweight http server section in the Installing WormBase documentation.
Memached/libmemcached
See above for details.
The Webapp
The web app and all Perl libraries will be installed automatically by the deploy_wormbase_webapp.sh script.
/usr/local/wormbase/website/production -> WSXXXX-YYYY.MM.DD-X.XX-XXXX /usr/local/wormbase/website/WSXXX-YYYY.MM.DD-X.XX-XXXX
For details on installation of the web app itself, see the Install The Webapp section of the main Installing WormBase guide.
Launch services on back end machines
# memcached $ sudo /etc/init.d/memcached restart # starman cd /usr/local/wormbase/website/production/bin ./starman-production.sh start
Hadoop Distributed File System (HDFS+Hoop)
We use the Hadoop Distributed File System to make it easier and faster to move files around.
Documentation on setting up Hadoop:
http://hadoop.apache.org/common/docs/current/single_node_setup.html
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
Install HDFS on each node.
http://mirror.olnevhost.net/pub/apache//hadoop/common/stable/hadoop-0.20.203.0rc1.tar.gz
Start the cluster:
$ bin/hadoop
Standalone operation By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
$ mkdir input $ cp conf/*.xml input $ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' $ cat output/*
cd /usr/local/wormbase/shared/services curl -O http://mirror.olnevhost.net/pub/apache//hadoop/common/stable/hadoop-0.20.203.0rc1.tar.gz tar xzf hadoop* ln -s hadoop-0.20.203 hadoop sudo apt-get install default-jre
Set up a single node:
addgroup hadoop adduser --ingroup hadoop hduser
Configure the hduser:
sudo su hduser
Append the following to .bashrc
# Set Hadoop-related environment variables export HADOOP_HOME=/usr/local/wormbase/services/hadoop # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) export JAVA_HOME=/usr/lib/jvm/java-6-sun # Some convenient aliases and functions for running Hadoop-related commands unalias fs &> /dev/null alias fs="hadoop fs" unalias hls &> /dev/null alias hls="fs -ls" # If you have LZO compression enabled in your Hadoop cluster and # compress job outputs with LZOP (not covered in this tutorial): # Conveniently inspect an LZOP compressed file from the command # line; run via: # # $ lzohead /hdfs/path/to/lzop/compressed/file.lzo # # Requires installed 'lzop' command. # lzohead () { hadoop fs -cat $1 | lzop -dc | head -1000 | less } # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/bin
Create an ssh key:
ssh-keygen -t rsa -P ""
Let the hadoop user connect to localhost:
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys chmod 600 $HOST/.ssh/authorized_keys
Test
ssh localhost
Set up hadoop
/usr/local/wormbase/shared/services/hadoop/conf/hadoop-env.sh
Set JAVA_HOME to
JAVA_HOME=/usr/lib/jvm/default-java
Configure directory where hadoop stores its files:
conf/core-site.xml
<!-- In: conf/core-site.xml --> <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>
Create the temporary direcotry:
$ sudo mkdir -p /usr/local/hdfs/tmp $ sudo chown hduser:hadoop /usr/local/hdfs/tmp
- ...and if you want to tighten up security, chmod from 755 to 750...
$ sudo chmod 750 /usr/local/hdfs/tmp
In conf/mapred.xml
<!-- In: conf/mapred-site.xml --> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>
In conf/hdfs-site.xml
<!-- In: conf/hdfs-site.xml --> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>
Format the namenode
hduser@ubuntu:~$ /usr/local/wormbase/services/hadoop/bin/hadoop namenode -format
HBase
curl -O http://www.reverse.net/pub/apache//hbase/stable/hbase-0.90.3.tar.gz
CouchDB
We use the document store CouchDB to store and replicate pregenerated HTML and other content across web server nodes.
See the CouchDB manual for a good introduction.
This post on StackOverflow discusses serving HTML via couch, too.
Comparing CouchDB and MongoDB? See this SlideShare presentation and this post on the MongoDB site.
Installation
Install dependencies
sudo apt-get install erlang libicu-dev libmozjs-dev libcurl4-openssl-dev
Build:
cd src curl -O http://www.apache.org/dyn/closer.cgi?path=/couchdb/1.1.0/apache-couchdb-1.1.0.tar.gz tar xzf apache-* cd apache-* ./configure ./make && sudo make install
Create a couchdb user if one doesn't already exist
adduser --system \ --home /usr/local/var/lib/couchdb \ --no-create-home \ --shell /bin/bash \ --group --gecos \ "CouchDB Administrator" couchdb
Fix permissions
chown -R couchdb:couchdb /usr/local/etc/couchdb chown -R couchdb:couchdb /usr/local/var/lib/couchdb chown -R couchdb:couchdb /usr/local/var/log/couchdb chown -R couchdb:couchdb /usr/local/var/run/couchdb chmod 0770 /usr/local/etc/couchdb chmod 0770 /usr/local/var/lib/couchdb chmod 0770 /usr/local/var/log/couchdb chmod 0770 /usr/local/var/run/couchdb
Set up Couchdb to start up automatically.
cp /usr/local/etc/init.d/couchdb /etc/init.d/couchdb sudo update-rc.d couchdb defaults
Edit the defaults file (/usr/local/etc/couchdb/default.ini) to allow more open permissions
bin_address 0.0.0.0
Start it up
sudo /etc/init.d/couchdb start
Test it out
curl http://127.0.0.1:5984/
Get a list of databases
curl -X GET http://127.0.0.1:5984/_all_dbs
Create a database for each new release of WormBase
curl -X PUT http://127.0.0.1:5984/wsXXX # We'll create databases for each WSXXX version
Deleting databases
curl -X DELETE http://127.0.0.1:5984/wsXXX
Create a document
curl -X PUT http://127.0.0.1:5984/wsXXX/UUID
Get a document
curl -X GET http://1270.0.0.1:5984/wsXXX/UUID
Create a document with an attachment
curl -X PUT http://127.0.0.1:5984/wsXXX/UUID/attachment -d @/usr/local/wormbase/databases/WS226/cache/gene/overview/WBGene00006763.html -H "Content-Type: text/html"
Get the document's attachment directly.
curl -X GET http://127.0.0.1:5984/wsXXX/UUID/attachment
Securing the database
Add an admin user and password to :
/usr/local/etc/couchdb/local.ini
The password will be hashed automatically.
Populating the database
During the staging process, the precache_widgets.pl script creates then populates a CouchDB instance on the development server. See that script and its related modules for examples.
In the future, we might also want to explore bulk inserts and performance tuning.
Querying the database
Get some general information about a database
curl -X GET http://127.0.0.1:5984/wsxxx/
Replication
See the guide to replication in the CouchDB book.
Replication is handled by the production/steps/replicate_couchdb.pl script.
Configuration
Edit /etc/couchdb/default.ini to the following.
The Futon Management interface
Couchapp
https://github.com/couchapp/couchapp
Measuring performance:
http://till.klampaeckel.de/blog/archives/16-measuring-couchdb-performance.html
Data mining nodes
The data mining and BLAST/BLAT server replaces the old aceserver. Because it handles requests for the AQL and WB pages, it is configured exactly as web cluster nodes, with the addition of BLAST, BLAT, and ePCR, and the iptable directives added as shown above.
WormMart node
Social feature node
The WormBase Blog, the WormBase Wiki, and the Worm Community Forums all rely on third party software. To make it easy to update this software, each of these components is maintained as a separate name-based virtual host running on the same server: wb-social.oicr.on.ca.
The WormBase Blog
The WormBase blog is a subdomain of wormbase.org: blog.wormbase.org. If it's moved, the DNS entry *must* be updated!
Host/Port : wb-social.oicr.on.ca:80 Alias: blog.wormbase.org MySQL database : wormbase_wordpress_blog Document root : /usr/local/wormbase/website-blog/current Logs : /usr/local/wormbase/blogs-access_log, /usr/local/wormbase/logs/blogs-error_log
Blog files are stored in /usr/local/wormbase/website-blog/current:
current -> wordpress-2.92
Add the following apache configuration to /usr/local/apache2/conf/extras/httpd-vhosts.conf
<VirtualHost *:80> ServerName blog.wormbase.org DocumentRoot /usr/local/wormbase/website-blog <Directory "/usr/local/wormbase/website-blog"> DirectoryIndex index.php index.html AddType application/x-httpd-php .php Order Deny,Allow Allow from all </Directory> LogFormat "%h %l %u %t \"%r\" %s %b" common LogFormat "%h %l %u %t %{Referer}i \"%{User-Agent}i\" \"%r\" %s %b" combined_format LogFormat "witheld %l %u %t \"%r\" %s %b" anonymous ErrorLog /usr/local/wormbase/logs/blog-error_log CustomLog /usr/local/wormbase/logs/blog-access_log combined_format </VirtualHost>
NOTE: when upgrading, be sure to copy the wp-config.php file and entire wp-content/ directory.
The WormBase Wiki
The WormBase Wiki is a subdirectory of the primary WormBase domain. If it's moved, the proxy that sits in front of it must be updated!
Host/Port : wb-social.oicr.on.ca:80 Alias: wiki.wormbase.org MySQL database : wormbase_wiki Document root : /usr/local/wormbase/website-wiki/current Logs : /usr/local/wormbase/wiki-access_log, /usr/local/wormbase/logs/wiki-error_log
Add the following apache configuration to /usr/local/apache2/conf/extras/httpd-vhosts.conf
<VirtualHost *:80> ServerName wiki.wormbase.org # Current is a symlink to the current installation. DocumentRoot /usr/local/wormbase/website-wiki/current <Directory "/usr/local/wormbase/website-wiki/current"> DirectoryIndex index.php index.html AddType application/x-httpd-php .php Order Deny,Allow Allow from all </Directory> LogFormat "%h %l %u %t \"%r\" %s %b" common LogFormat "%h %l %u %t %{Referer}i \"%{User-Agent}i\" \"%r\" %s %b" combined_format LogFormat "witheld %l %u %t \"%r\" %s %b" anonymous ErrorLog /usr/local/wormbase/logs/wiki-error_log CustomLog /usr/local/wormbase/logs/wiki-access_log combined_format </VirtualHost>
The Worm Community Forums
The WormBase Wiki is a subdirectory of the primary WormBase domain. If it's moved, the proxy that sits in front of it must be updated!
Host/Port : wb-social.oicr.on.ca:80 Alias: forums.wormbase.org MySQL database : wormbaseforumssmf Document root : /usr/local/wormbase/website-forums Logs : /usr/local/wormbase/forums-access_log, /usr/local/wormbase/logs/forums-error_log
Add the following apache configuration to /usr/local/apache2/conf/extras/httpd-vhosts.conf
<VirtualHost *:80> ServerName forums.wormbase.org # Current is a symlink to the current version of SMF DocumentRoot /usr/local/wormbase/website-forums/current <Directory "/usr/local/wormbase/website-forums/current"> DirectoryIndex index.php index.html AddType application/x-httpd-php .php Order Deny,Allow Allow from all </Directory> LogFormat "%h %l %u %t \"%r\" %s %b" common LogFormat "%h %l %u %t %{Referer}i \"%{User-Agent}i\" \"%r\" %s %b" combined_format LogFormat "witheld %l %u %t \"%r\" %s %b" anonymous ErrorLog /usr/local/wormbase/logs/forums-error_log CustomLog /usr/local/wormbase/logs/forums-access_log combined_format </VirtualHost>
Add "Listen 8081" to the primary httpd.conf file.
Note: If the forum is moved, it is also necessary to update Settings.php and the paths to the Sources and Themes directories in the forum Administration Panel > Configuration > Server Settings.
NFS
We use NFS at WormBase to consolidate logs, simplify maintainence of temporary and static files, and provide access to common (file-system) databases.
NFS is served from wb-web1. NOTE: This may be a I/O performance bottleneck - we may need a separate machine
/usr/local/wormbase/shared
This directory will be mounted at the same path for each client.
Contents:
logs/ // Server logs databases // File-based databases for various searches tmp // tmp images static // static files website-shared-files // images and other shared files
Note that logs and databases are symlinks in each production node:
/usr/local/wormbase/databases -> shared/databases
Install NFS
On Debian, the NFS server needs:
sudo apt-get install nfs-kernel-server nfs-common portmap
Specify what to share, to whom, and with which privileges
Share mounts are configured through /etc/exports
# WormBase NFS for temporary and static content /usr/local/wormbase/shared 206.108.125.177(rw,no_root_squash) 206.108.125.190(rw,no_root_squash) 206.108.125.168(rw,no_root_squash) 206.108.125.191(rw,no_root_squash)
After making changes to /etc/exports, load them by:
sudo exportfs -a
Lock down access to NFS from most hosts
We can use /etc/hosts.deny to quickly lock down NFS:
portmap:ALL lockd:ALL mountd:ALL rquotad:ALL statd:ALL
Allow access to select hosts
We can use /etc/hosts.allow to specifically allow access to NFS:
# WormBase NFS services portmap: ***.***.***.*** lockd: ***.***.***.*** rquotad: ***.***.***.*** mountd: ***.***.***.*** statd: ***.***.***.***
Configure NFS clients
Install NFS
On Debian, NFS clients require:
sudo apt-get install nfs-common portmap
Setting up NFS share to mount at boot
sudo emacs /etc/fstab
Add
wb-web1.oicr.on.ca:/usr/local/wormbase/shared /usr/local/wormbase/shared nfs rw,rsize=32768,wsize=32768,intr,noatime 0 0
Manually mount the NFS share
sudo mount ${NFS_SERVER}:/usr/local/wormbase/shared /usr/local/wormbase/shared // The shared dir must already exist or using the entry in fstab: sudo mount /usr/local/wormbase/shared
Unmounting the NFS share
sudo umount /usr/local/wormbase/shared
Miscellaneous
Build the user preferences database
The website uses a mysql backend to store user preferences, browsing history, session data. This shouldn't ever need to be recreated (at least until we have a migration path in place from an old database to a new one!), but here's how to create it for reference. For now, this database is hosted on the same server providing the reverse proxy.
mysql -u root -p < /usr/local/wormbase/website/production/util/user_login.sql mysql -u root -p -e 'grant all privileges on wormbase_user.* to wb@localhost';
# All nodes currently use the same session database. mysql -u root -p -e 'grant all privileges on wormbase_user.* to wb@wb-web1.oicr.on.ca'; mysql -u root -p -e 'grant all privileges on wormbase_user.* to wb@wb-web2.oicr.on.ca'; mysql -u root -p -e 'grant all privileges on wormbase_user.* to wb@wb-web3.oicr.on.ca'; mysql -u root -p -e 'grant all privileges on wormbase_user.* to wb@wb-web4.oicr.on.ca'; mysql -u root -p -e 'grant all privileges on wormbase_user.* to wb@wb-mining.oicr.on.ca';
Q: How/Where do I configure the location of the wormbase_user database in the application?
Logs
All relevant logs can be found at:
ls /usr/local/wormbase/logs nginx-error.log // The reverse proxy error log nginx-access.log // The reverse proxy access log nginx-cache.log // The reverse proxy cache log catalyst_error.log // The catalyst error log
See Also
Monitoring
Logs
EVERYTHING BELOW HERE IS DEPRECATED
Monitoring
See the monitoring services document? Nagios requires apache and fcgi
Should I preserve the fastcgi,fcgi configuration just in case?
FastCGI, FCGI, Apache, and mod_perl
Originally, WormBase ran under apache + mod_perl.
We also experimented with fcgi and fcgid +apache.
Installing fastcgi
curl -O http://www.fastcgi.com/dist/mod_fastcgi-2.4.6.tar.gz tar xzf mod_fastcgi* cd mod_fastcgi* cp Makefile.AP2 Makefile make top_dir=/usr/local/apache2 sudo make top_dir=/usr/local/apache2 install
If you get an error on make saying it can't find special.mk (which is supposed to be distributed with httpd but isn't on CentOS and is not part of httpd-devel, either), try:
sudo apxs -n mod_fastcgi -i -a -c mod_fastcgi.c fcgi_buf.c fcgi_config.c fcgi_pm.c fcgi_protocol.c fcgi_util.c
Add an entry to httpd.conf like this:
LoadModule fastcgi_module modules/mod_fastcgi.so // Note: if you use the apxs command above, it inserts an incorrect line into your httpd.conf file. // Edit it to read exactly as above.
Launch the fastcgi server
// as a socket server in daemon mode /usr/local/wormbase/website/script/wormbase_fastcgi.pl \ -l /tmp/wormbase.sock -n 5 -p /tmp/wormbase.pid -d // as a deamon bound to a specific port script/wormbase_fastcgi.pl -l :3001 -n 5 -p /tmp/wormbase.pid -d
Set up the fastcgi server to launch at boot
Symlink the webapp-fastcgi.init script to /etc/init.d
cd /etc/init.d sudo ln -s /usr/local/wormbase/website/util/init/webapp-fastcgi.init wormbase-fastcgi
Set up symlinks in runlevels:
cd ../rc3.d sudo ln -s ../init.d/wormbase-fastcgi S99wormbase-fastcgi cd ../rc5.d sudo ln -s ../init.d/wormbase-fastcgi S99wormbase-fastcgi
Add a cron job that keeps FCGI under control
The following cron job will kill off fcgi children that exceed the specified memory limit (in bytes).
sudo crontab -e */30 * * * * /usr/local/wormbase/website/util/crons/fastcgi-childreaper.pl \ `cat /tmp/wormbase.pid` 104857600
mod_fcgid
mod_fcgid is an alternative to fcgi
cd src/ wget http://www.carfab.com/apachesoftware/httpd/mod_fcgid/mod_fcgid-2.3.5.tar.gz tar xzf mod_fcgid-2.3.5.tar.gz cd mod_fcgid-2.3.5 APXS=/usr/local/apache2/bin/apxs ./configure.apxs make sudo make install
Apache
Configure Apache to connect to the fastcgi server
Edit /usr/local/apache2/conf/extra/httpd-vhosts.conf
<VirtualHost *:8000> # ServerName beta.wormbase.org ErrorLog /usr/local/wormbase/logs/wormbase2.error_log TransferLog /usr/local/wormbase/logs/wormbase2.access_log # 502 is a Bad Gateway error, and will occur if the backend server is down # This allows us to display a friendly static page that says "down for # maintenance" Alias /_errors /home/todd/projects/wormbase/website/trunk/root/error-pages ErrorDocument 502 /_errors/502.html # Map dynamic images to the file system # static images are located at img Alias /images /tmp/wormbase/images/ # <Directory /filesystem/path/to/MyApp/root/static> # allow from all # </Directory> # <Location /myapp/static> # SetHandler default-handler # </Location> # Static content served directly by Apache DocumentRoot /usr/local/wormbase/website/root # Alias /static /usr/local/wormbase/website-2.0/root # Approach 1: Running as a static server (Apache handles spawning of the webapp) # <IfModule fastcgi_module> # FastCgiServer /usr/local/wormbase/website-2.0/script/wormbase_fastcgi.pl -processes 3 # Alias / /usr/local/wormbase/website-2.0/script/wormbase_fastcgi.pl/ # </IfModule> # Approach 2: External Process (via mod_fcgi ONLY) <IfModule mod_fastcgi.c> # This says to connect to the Catalyst fcgi server running on localhost, port 777 # FastCgiExternalServer /tmp/myapp.fcgi -host localhost:7777 # Or to use the socket FastCgiExternalServer /tmp/wormbase.fcgi -socket /tmp/wormbase.sock # Place the app at root... Alias / /tmp/wormbase.fcgi/ # ...or somewhere else Alias /wormbase/ /tmp/wormbase.fcgi/ </IfModule> # fcgid configuration # <IfModule mod_fcgid> # # This should point at your myapp/root # DocumentRoot /usr/local/wormbase/beta.wormbase.org/root # Alias /static /usr/local/wormbase/beta.wormbase.org/root/static # <Location /static> # SetHandler default-handler # </Location> # # Alias / /usr/local/wormbase/beta.wormbase.org/script/wormbase_fastcgi.pl/ # AddType application/x-httpd-php .php # <Location /> # Options ExecCGI # Order allow,deny # Allow from all # AddHandler fcgid-script .pl # </Location> # </IfModule> </VirtualHost>
Edit /usr/local/apache2/conf/httpd.conf
Add the appropriate Listen PORT directive.