Difference between revisions of "WormBase Infrastructure"

From WormBaseWiki
Jump to navigationJump to search
 
(8 intermediate revisions by one other user not shown)
Line 2: Line 2:
  
 
This document describes the WormBase infrastructure including hardware and software configuration.
 
This document describes the WormBase infrastructure including hardware and software configuration.
 
=== Document conventions ===
 
 
Commands that require super-user privileges are prefaced with "sudo" or a "$" prompt. If the system is configured correctly, you should not need to be a root user in order to update the site.
 
 
<nowiki>*** Potential stumbling blocks are indented and hilighted with a
 
*** preceeding triple asterisk.  Yikes!
 
</nowiki>
 
  
 
== General network topology ==
 
== General network topology ==
Line 50: Line 42:
 
== Servers-at-a-glance  ==
 
== Servers-at-a-glance  ==
  
=== Cold Spring Harbor ===
+
=== Production Nodes (GBrowse) ===
 
 
{| cellspacing="1" cellpadding="2" border="1"
 
|-
 
! primary name
 
! aliases/other FQDNs
 
! tasks
 
! ports*
 
! mysqld server ID
 
|- align="top"
 
| fe.wormbase.org
 
| www.wormbase.org
 
|
 
*primary front end Squid caching server
 
*balances load to back end servers
 
 
 
| '''80'''
 
| -
 
|- align="top"
 
| fe2.wormbase.org
 
|
 
|
 
|
 
| -
 
|- align="top"
 
| unc.wormbase.org
 
| align="top" |
 
brie6.cshl.edu, stein.cshl.edu
 
 
 
|
 
*serves up most AceDB-based pages at WormBase
 
*hosts wormbase and wormbook mailing lists
 
*hosts WormBook (not accelerated by fe.wormbase.org)
 
*hosts steinlab.cshl.edu (not accelerated by fe.wormbase.org)
 
 
 
|
 
'''ssh: 22'''
 
 
 
'''httpd: 80''' (WormBook)<br>httpd: 8080<br>
 
 
 
| 7
 
|- align="top"
 
| be1
 
|
 
|
 
|
 
| 8
 
|- align="top"
 
| vab.wormbase.org
 
| align="top" |
 
|
 
*serves Genome Browser pages for WormBase
 
 
 
| httpd: 8080
 
| 2
 
|- align="top"
 
| gene.wormbase.org
 
|
 
|
 
| httpd: 8080<br>
 
| 3
 
|- align="top"
 
| blast.wormbase.org
 
| align="top" |
 
|
 
*provides blast and blat searches for WormBase
 
 
 
|
 
'''ssh: 22'''
 
 
 
httpd: 8080
 
 
 
| 4
 
|- align="top"
 
| aceserver.cshl.org
 
| align="top" |
 
|
 
*provides support for AQL, WB query language, and cisortho
 
 
 
|
 
'''httpd: 80'''
 
 
 
'''sgifaceserver: 2005'''<br>
 
 
 
'''mysqld: 3306'''
 
 
 
| 5
 
|- align="top"
 
| biomart.wormbase.org
 
| align="top" |
 
|
 
*hosts the WormBase implementation of BioMart
 
 
 
| 80 (?)
 
|
 
|- align="top"
 
| dev.wormbase.org
 
| align="top" |
 
brie3.cshl.edu
 
 
 
|
 
*hosts the WormBase FTP site
 
 
 
|
 
'''ftp: 21'''
 
 
 
'''ssh: 22'''<br>
 
 
 
'''httpd: 80'''
 
 
 
| 1 (master)
 
|- align="top"
 
| crestone.cshl.org
 
| align="top" |
 
|
 
*secondary WormBase development machine
 
*hosts the WormBaseWiki
 
 
 
| httpd: 8080
 
| 6
 
|}
 
 
 
=== OICR ===
 
 
 
==== Production Nodes (GBrowse) ====
 
  
 
{| cellspacing="1" cellpadding="2" border="1"
 
{| cellspacing="1" cellpadding="2" border="1"
Line 195: Line 63:
 
httpd: 80, 8080
 
httpd: 80, 8080
 
| apache, mysql
 
| apache, mysql
| GBrowse2 master server, GBrowse1 server
+
|
| gbrowse
+
|  
 
| no
 
| no
 
| yes
 
| yes
Line 209: Line 77:
 
httpd: 80
 
httpd: 80
 
| apache, mysql
 
| apache, mysql
| GBrowse slave server
+
|
| gbrowse
+
|  
 
| no
 
| no
 
| yes
 
| yes
Line 216: Line 84:
  
 
|- align="top"
 
|- align="top"
| wb-web3-warm.oicr.on.ca
+
| wb-web3.oicr.on.ca
 
|  
 
|  
 
|  
 
|  
Line 223: Line 91:
 
httpd: 80
 
httpd: 80
 
| apache, mysql
 
| apache, mysql
| GBrowse slave server
+
|
| gbrowse
+
|
 
| no
 
| no
 
| yes
 
| yes
Line 230: Line 98:
  
 
|- align="top"
 
|- align="top"
| wb-web4-warm.oicr.on.ca
+
| wb-web4.oicr.on.ca
 
|  
 
|  
 
|  
 
|  
Line 237: Line 105:
 
httpd: 80
 
httpd: 80
 
| apache, mysql
 
| apache, mysql
| GBrowse slave server
+
|
| gbrowse
+
|
 
| no
 
| no
 
| yes
 
| yes
Line 274: Line 142:
 
|}
 
|}
  
==== Development servers ====
+
=== Development servers ===
  
 
{| cellspacing="1" cellpadding="2" border="1"
 
{| cellspacing="1" cellpadding="2" border="1"
Line 301: Line 169:
 
"Production nodes" or "live servers" are the servers that provide all the functionality at www.wormbase.org.
 
"Production nodes" or "live servers" are the servers that provide all the functionality at www.wormbase.org.
  
=== fe.wormbase.org ===
+
TODO: memcached servers
 +
nginx description
  
This server acts as a proxy for the entire WormBase site fielding requests for www.wormbase.org on port 80 using the reverse-proxy caching server "squid". Each request is filtered through a simple load balancing script which rewrites the URL and sends the request to an appropriate back end server. Returned results are cached in memory and/or on disk before being returned to the client to accelerate subsequent requests. Internally, this server is also known as fe.wormbase.org (for "front end"). fe.wormbase.org is configured as a redundant RAID in case of catastrophic disk failure.
+
=== web1 ===
  
fe.wormbase.org currently redirects the following requests (evaluated in this order):
+
This server acts as a proxy for the entire wormbase.org site as well as a standard web node. It fields requests for wormbase.org on port 80 using the reverse-proxy daemon "nginx".  nginx distributes requests across a pool of partially redundant webservers, caching the result to accelerate future requests.  Principal log files for the site are located here.
  
{| border="1"
+
=== wb-web[1-4].oicr.on.ca ===
! URL
 
! host
 
|-
 
| /.*squid\/cachemgr\.cgi/
 
| localhost
 
|-
 
| /.*\/gbrowse\/.*/
 
| vab
 
|-
 
| /cisortho/
 
| aceserver
 
|-
 
| /wiki/
 
| crestone
 
|-
 
| <nowiki>*</nowiki>
 
| unc (brie6)
 
|}
 
  
=== unc.wormbase.org (brie6) ===
+
cname: none
  
Formerly the main www.wormbase.org server, unc.wormbase.org is a backend origin server running the entire WormBase site. It also serves the mailing lists archives, the RSS feeds, and access log statistics. unc also hosts a number of virtual hosts that do not pass through the proxy server on fe. Port 80 for unc.wormbase.org is open in the firewall.
+
These four servers act as redundant cluster nodes. Each runs our web app as a standalone FCGI process bound to a TCP socket.
  
=== vab.wormbase.org ===
+
=== wb-mining.oicr.on.ca ===
  
vab answers requests on port 80 from fe.wormbase.org with httpd. Currently, vab.wormbase.org is only serving requests to the Genome Browser although it is capable of serving any request at WormBased. It hosts no virtual hosts. Port 80 for vab.wormbase.org is open in the firewall.
+
cname: mining.wormbase.org
  
=== aceserver.cshl.org ===
+
BLAST and BLAT requests are handled by blast.wormbase.org. Requests for blast analyses at blast.wormbase.org do not pass through the proxy server on fe. blast.wormbase.org is also capable of handling requests load balancing requests but the redirector on fe.wormbase.org is not currently configured for this. Port 80 for blast.wormbase.org is open in the firewall.
  
aceserver is the publically accessible data mining server. Requests to aceserver.cshl.org do not pass through the proxy server on fe. Ports 80, 3360 (mysql), and 2005 (acedb) are open for aceserver.cshl.org.
+
=== wb-social.oicr.on.ca ===
  
aceserver.cshl.org also handles
+
cname: blog.wormbase.org
 +
cname: forums.wormbase.org
 +
cname: wiki.wormbase.org
  
    http://www.wormbase.org/db/searches/aql_query
+
=== wb-biomart.oicr.on.ca ===
    http://www.wormbase.org/db/searches/wb_query
 
    http://www.wormbase.org/cisortho/
 
  
=== blast.wormbase.org ===
+
cname: biomart.wormbase.org
 
 
BLAST and BLAT requests are handled by blast.wormbase.org. Requests for blast analyses at blast.wormbase.org do not pass through the proxy server on fe. blast.wormbase.org is also capable of handling requests load balancing requests but the redirector on fe.wormbase.org is not currently configured for this. Port 80 for blast.wormbase.org is open in the firewall.
 
 
 
=== biomart.wormbase.org ===
 
  
 
biomart.wormbase.org serves WormMart. Requests to biomart.wormbase.org do not pass through the proxy server on fe.
 
biomart.wormbase.org serves WormMart. Requests to biomart.wormbase.org do not pass through the proxy server on fe.
Line 433: Line 280:
  
 
The Cache Manager CGI is only available from fe.wormbase.org:81. Note you must be "on localhost" in order to use this statistics viewer.
 
The Cache Manager CGI is only available from fe.wormbase.org:81. Note you must be "on localhost" in order to use this statistics viewer.
 +
 +
[[Category: Architecture (Web Dev)]]

Latest revision as of 19:14, 18 June 2014

The WormBase Infrastructure

This document describes the WormBase infrastructure including hardware and software configuration.

General network topology

The network topology of the WormBase site looks like this:

                   REQUEST
                      |                                (Non-accelerated / non-cached reqeuests)
                      |---------------------------------------------------------------------------
                      |                              |             |             |             |
                      |                              |             |             |             |
            |---------|---------|                    |             |             |             |
            |         |         |                    |             |             |             |
            |         |         |                    |             |             |             |
            |        \/         |                    |             |             |             |
            |       squid       |                    |             |             |             |
            |         |         | www (fe)           |             |             |             |
            |        \/         |                    |             |             |             |
            |   load balancer   |                    |             |             |             |
            |         |         |                    |             |             |             |
            |---------|---------|                    |             |             |             |
                      |                              |             |             |             |
                      |                              |             |             |             |
       |------------------------------|              |             |             |             |
       |              |               |              \/            \/            \/            \/
 |-----------|  |-----------|  |-----------|    |-----------| |-----------| |-----------| |-----------|
 |     |     |  |     |     |  |           |    |           | |           | |           | |           |
 |   httpd   |  |   httpd   |  |   httpd   |    |   httpd   | | httpd/ace | |  biomart  | |  future   |
 |    unc    |  |    vab    |  |  crestone |    |   blast   | | aceserver | |           | |           |
 -------------  -------------  -------------    ------------- ------------- ------------- -------------

The WormBase infrastructure is composed of:

  • a pool of partially redundant web servers running GBrowse, each with all required GBrowse databases
  • a pool of partially redundant web servers running the web application
  • a pool of redundant database servers running AceDB and MySQL
  • a caching/load balancing server that distributes requests to the server pool, aggressively caching responses
  • major services are either served on dedicated hosts (GBrowse) or as VirtualHosts on specific servers (wiki, blog, forums)

Servers-at-a-glance

Production Nodes (GBrowse)

primary name aliases/other FQDNs ports* services primary purpose hosted components acedb? GFF mysql? backup
wb-web1.oicr.on.ca

ssh: 22

httpd: 80, 8080

apache, mysql no yes monthly
wb-web2.oicr.on.ca

ssh: 22

httpd: 80

apache, mysql no yes never
wb-web3.oicr.on.ca

ssh: 22

httpd: 80

apache, mysql no yes never
wb-web4.oicr.on.ca

ssh: 22

httpd: 80

apache, mysql no yes never
wb-social.oicr.on.ca

ssh: 22

httpd: 80, 8080, 8081

apache, mysql community components blog (80), forums (8080), wiki (8081) no no weekly
wb-mining.oicr.on.ca

ssh: 22

httpd: 80

mysql: 3306

apache, mysql, acedb data mining yes yes monthly

Development servers

primary name aliases/other FQDNs tasks ports* services
dev.wormbase.org wb-dev.oicr.on.ca
  • primary WormBase development machine

ssh: 22

httpd: 80

apache, mysql, gbrowse

'* Firewall exceptions in boldface'

Production nodes

"Production nodes" or "live servers" are the servers that provide all the functionality at www.wormbase.org.

TODO: memcached servers nginx description

web1

This server acts as a proxy for the entire wormbase.org site as well as a standard web node. It fields requests for wormbase.org on port 80 using the reverse-proxy daemon "nginx". nginx distributes requests across a pool of partially redundant webservers, caching the result to accelerate future requests. Principal log files for the site are located here.

wb-web[1-4].oicr.on.ca

cname: none

These four servers act as redundant cluster nodes. Each runs our web app as a standalone FCGI process bound to a TCP socket.

wb-mining.oicr.on.ca

cname: mining.wormbase.org

BLAST and BLAT requests are handled by blast.wormbase.org. Requests for blast analyses at blast.wormbase.org do not pass through the proxy server on fe. blast.wormbase.org is also capable of handling requests load balancing requests but the redirector on fe.wormbase.org is not currently configured for this. Port 80 for blast.wormbase.org is open in the firewall.

wb-social.oicr.on.ca

cname: blog.wormbase.org cname: forums.wormbase.org cname: wiki.wormbase.org

wb-biomart.oicr.on.ca

cname: biomart.wormbase.org

biomart.wormbase.org serves WormMart. Requests to biomart.wormbase.org do not pass through the proxy server on fe.

Development nodes

WormBase also maintains two development nodes.

dev.wormbase.org (brie3)

dev.wormbase.org is the current development server at WormBase. It maintains a CVS-checked out version of the source and database tarballs that is used to automatically push software and new databases onto the production nodes and mirror servers (described below). dev.wormbase.org also hosts the WormBase FTP site, aliased to ftp.wormbase.org.

crestone.cshl.org

crestone hosts the WormBase rearchitecture. Please don't junk it up with files -- disk space is at a premium!

Frozen release servers

freeze1 -- https://freeze1.wormbase.org:8333/

ws100.wormbase.org      143.48.220.44
ws110.wormbase.org      143.48.220.62
ws120.wormbase.org      143.48.220.46
ws130.wormbase.org      143.48.220.65



freeze2 -- https://freeze2.wormbase.org:8333/

ws140.wormbase.org      143.48.220.55
ws150.wormbase.org      143.48.220.66
ws160.wormbase.org      143.48.220.67


brie6 -- https://brie6.cshl.edu:8333/

 ws170.wormbase.org 143.48.220.208
 ws180.wormbase.org 143.48.220.69
 ws190.wormbase.org 143.48.220.192

be1.wormbase.org -- https://be1:8333/

 none currently

Typical request flow / infrastructure rationale

The WormBase infrastructure and design decisions are best understood by following requests through the system.

Static pages

First, consider a request for the home page, index.html. This request is recieved at www.wormbase.org by the squid server running on port 80. If this is the first request for the home page (unlikely!), squid passes the request to an instance of the load balancer script. This script rewrites the URL and sends it to one of the back end servers, unc or vab. The back end server returns the page to fe.wormbase.org which caches it in memory and on disk and returns it to the user. Subsequent requests are served directly from the cache, bypassing the back end servers altogether.

Dynamic pages

Consider a request for a typical gene page, unc-2. This request is recieved at www.wormbase.org by the squid server. Assuming that this is the first access of the unc-2 gene page during this release cycle, the load balancer sends the request to either unc.wormbase.org or vab.wormbase.org. The returned page is cached by the squid server and returned to the client.

It is important to note that the Gene page contains dynamically generate images. These images are now generated using absolute URLs. Thus, when a cached gene page is loaded, the client can correctly request the image from the appropriate back end server. For this to work correctly, the back end servers *must* be registered with DNS and have port 80 open in the firewall.

Special cases

  • POST requests

Because squid uses URLs as identifiers in the cache, any page view that results from a POST action will not be cached.

  • GBrowse (/db/seq/gbrowse)

Requests for Gbrowse (/db/seq/gbrowse) are not cached. Images created by gbrowse_img (such as those embedded on the Gene Summary and Clone pages) *are* cached as these can be considerably more "static" than individual GBrowse views. See the section "Access Control Lists" for additional details. Currently, requests for Gbrowse are handled exclusively by vab.wormbase.org.

  • Mailing list archives (/mailarch/)
  • RSS feeds (/rss/* and /private/manage_newsfeeds)
  • Access log statistics (/stats/)
  • The Feedback form (/db/misc/feedback)

These items are all handled directly by unc.wormbase.org (redirection is handled at the level of the redirector on fe.wormbase.org).

  • The Cache Manager CGI

The Cache Manager CGI is only available from fe.wormbase.org:81. Note you must be "on localhost" in order to use this statistics viewer.