Difference between revisions of "General administration"

Revision as of 15:21, 31 July 2011

WormBase Administration

This document describes maintenance of the various hardware and software components of WormBase. This includes monitoring the health of the site, starting and stopping various servers, description of cron jobs that keep the site running, the location and contents of important log files, and what to do when things go wrong.

Document conventions

Commands that require super-user privileges are prefaced with "sudo" or a "$" prompt. If the system is configured correctly, you should not need to be a root user in order to update the site.

*** Potential stumbling blocks are indented and hilighted with a
 *** preceeding triple asterisk.  Yikes!

Managing squid

Controlling the squid server

Test if the squid server is running:

fe> sudo /etc/rc.d/init.d/squid status

Start squid:

fe> sudo /etc/rc.d/init.d/squid start

Stop squid:

fe> sudo /etc/rc.d/init.d/squid stop

Reset the squid cache. This option will stop squid, reset the swap.state file, and restart squid. The end result is squid started anew with an emptyy cache.

fe> sudo /etc/rc.d/init.d/squid resetcache

Delete the squid httpd-style access_logs. Like reset cache, this option takes the server down briefly in order to rotate the squid logs.

fe> sudo /etc/rc.d/init.d/squid deletelogs

Reset the cache AND delete the access_logs. This command is typically used following a system update.

fe> sudo /etc/rc.d/init.d/squid fullreset

Test the squid configuration file (not necessary to bring the server down):

fe> sudo /etc/rc.d/init.d/squid parse

Reload the squid configuration file (for most changes, not necessary to bring the server down):

fe> sudo /etc/rc.d/init.d/squid reload

Monitoring squid

The installation of squid offers two options for monitoring the server.

The Cache Manager CGI

The Cache Manager CGI is only available from fe.wormbase.org:81. Note you must be "on localhost" in order to use this statistics viewer. You can access this CGI by tunneling port 81 traffic to fe.wormbase.org and then accessing the following URL:

http://localhost:81/squid/cachemgr.cgi

CacheHost  ¬†: fe.wormbase.org
CachePort  ¬†: 80
Managername¬†: [LSs canine]
Password   ¬†: [none required]

RRD Tool / real-time graphical analysis

You can also monitor squid performance in real-time graphically using RRDTools. Again, these tools are located behind the firewall on port fe.wormbase.org:81.

1 day statistics:
http://localhost:81/squid-monitor/1day.cgi

1 week statistics:
http://localhost:81/squid-monitor/1week.cgi

Squid tips and tricks

squid PID

The squid PID file is located at fe:/usr/local/squid/var/logs/squid.pid.

squid_start

When squid launches, it first looks for the squid_start script adjacent to the squid binary. This is a useful place for storing additional administration tasks.

Squid restarts with write-errors

If log files grow to large and squid can no longer write to them, it will restart.

Purging objects from the cache

Occasionally, a corrupt file may be sstored in the cache (ie if a script has a bug that does not generate a server error).

To purge an object (ie URL) from the cache, use the squidclient binary.

*** You must be "on localhost" -- logged in to fe.wormbase.org to use
 *** this command.

 /usr/local/squid/bin/squidclient -p [port] -m PURGE [url]

ie:
 squidclient -p 80 -h fe.wormbase.org -m PURGE \
   http://www.wormbase.org/db/gene/gene?name=unc-26

Log files

The various servers at WormBase create a number of different logs.

Squid

Squid creates a large number of logs but only two are useful on a day-to-day basis.

/usr/local/squid/var/logs/cache.log

This is the primary source of information on the health of the squid server. This log is also echoed to the system log /var/log/messages. This log grows very slowly if squid is running well; if squid is sick, watch for a tremendous growth in the size of this log!

httpd-style access logs

Using the configuration described above, squid creates httpd-style logs at /usr/local/squid/logs/access_log. This logs are akin to the httpd logs found in /usr/local/wormbase/logs/access_log. They should be used for the primary log analysis.

*** Note that these logs are very similar to httpd access_logs with
 *** the addition of a single column. This column records the squid
 *** response and hierarchy code, showing how the request was handled.

Acedb

Acedb produces two logs. These are both located in the "database" folder of the current database.

*** Both of these logs MUST be writable by the acedb user! If they
 *** aren't, the database will not be able to start.  This is not
 *** always obvious - look for cycling xinetd requests attempting to
 *** launch the database in /var/log/messages.

log.wrm

This slowly growing log contains information on the status of the server.

serverlog.wrm

This quickly growing log records queries to the database. It is really only useful when trying to debug slow queries.

*** If serverlog.wrm reaches 2 GB in size, acedb may not be able to
 *** write to the file. This causes the database to crash, and xinetd
 *** to fail in launching it! Don't let this happen (see below)!

Apache

Apache creates an error log and access log that can be used for debugging and watching direct requests to back end origin servers.

/usr/local/wormbase/logs/access_log

A record of all direct requests to the httpd origin server.

*** Since squid intercepts all requests -- and serves many directly
 *** from the cache -- the httpd access_log is not a true indicator of
 *** access statistics.

/usr/local/wormbase/logs/error_log

Errors encountered during execution of requests at the origin server.

Cron jobs

This section describes cron jobs that are used to keep WormBase running smoothly.

*** Some jobs are specific to certain machines. The appropriate
 *** machine is indicated in parenthesis when appropriate.

*** All cron jobs shown here should be entered in the root crontab
 *** unless otherwise indicated.

Monitoring

RRD-Tools / graphical analysis of squid logs (fe.wormbase.org)

 */5 * * * * /usr/local/apache/htdocs/squid-monitor/poll.pl fe.wormbase.org:80

Update the RRDTool-based graphs at 5 minute intervals. These statistics can be viewed ONLY from localhost on fe.wormbase.org at:

http://localhost:81/squid-monitor/1day.cgi

Log rotation

The following cron entries keep the log files in check.

squid (fe.wormbase.org)

# Rotate the squid httpd-style access logs once per day at 2 AM
 # This cron job is only required on fe.wormbase.org
 0 2 * * * /usr/local/wormbase-admin/log_maintenance/rotation/rotate_squid_logs.pl

acedb (unc, vab, crestone)

The Acedb serverlog.wrm grows to epic proportions and rarely contains useful (ie on a day-to-day basis) information. And when the logs grow very large, even log rotation can become painfully slow and even crash the server. Instead, we purge the logs by writing a single bit to the file with the following cron job.

# Reset the serverlog.wrm once an hour
 30 * * * * /usr/local/wormbase-admin/log_maintenance/rotation/purge_acedb_logs.pl

Synchronization

Check for new releases of the database / update software

The satellite servers are managed by a script that utilizes Bio::GMOD. This script - gmod_update_installation-wormbase.pl - checks for a new live release of the databsae. If present, it fetches and downloads the databases from the development server and installs them. The WormBase software on the server is synchronized to a checked out version of the source regardless of whether there is a new database present.

gmod_update_installation-wormbase.pl relies on a configuration file that should be specified for each server. Typically, it is sufficient to check for new updates twice a day.

0 1,13 * * * /usr/local/bin/gmod_update_installation-wormbase.pl \
 --config /path/to/config/file

See updating_wormbase.pod for additional details.

Maintain software rsync modules (brie3)

brie3 (dev.wormbase.org) hosts an rsync module of the WormBase software. This module is used to to keep mirror sites and production nodes up-to-date.

NOTE: THIS IS NOW DEPRECATED (2/2/2006) IN FAVOR OF A MORE SIMPLIFIED PROCEDURE.

This following cronjob creates a cvs export of the "stable" tag on the main development branch on a nightly. Select files are purged from this export that may inadvertently break mirrors and production nodes. For convenience, the module resides on the FTP server in /usr/local/wormbase/ftp/pub/wormbase/mirror.

The "wormbase-live" symlink always points to the most current version.

0 3 * * * /usr/local/wormbase-admin/update_scripts/export_stable_software.pl

See "wormbase-admin/docs/updating_wormbase.pod" for how to create the stable tag in the repository.

Synchronize release notes with the checked out source (brie3)

This cron job ensures that release notes distributed by sanger are kept in the production html/ directory. This rather inane task should probably just be handled once by the update process.

0 1 * * * /usr/local/wormbase-admin/update_scrips/rsync_misc.pl

Troubleshooting / when things go wrong

Symptom: hardware crash of fe.wormbase.org

In the event that fe.wormbase.org suffers a catastrophic hardware failure, DNS entries can be modified to point www -> unc.wormbase.org. Once this change percolates through the system, WormBase will be restored.

Symptom: periodic success/failures of a single page

Because of the distributed nature of the WormBase infrastructure, troubleshooting problems is more complicated than a single-server installation. In particular, if a single requests succeeds and fails with about similar frequency, one of the origin servers may be down.

Check the following things in this order:

Is Acedb running and responsive on the origin servers?
Is Mysqld running and responsinve on the origin servers?
Is httpd running and responsive on the origin servers?

Check that httpd is running on all back-end origin servers. If it is not, ensure that there is sufficient disk space for logging and restart apache by:

brie6> sudo /usr/local/apache/bin/apachectl start

Is squid running and responsive on fe.wormbase.org?

The most unlikely event is that squid itself may have crashed on the origin server. You can check this by:

fe> sudo /etc/rc.d/init.d/squid status

If squid is down, ensure that there is sufficient disk space and that the log directories are writable by the squid user (if these conditions are not met, squid will exit and send errors to /usr/local/squid/var/logs/cache.log). Restart squid by,

fe> sudo /etc/rc.d/init.d/squid start

Appendices

Appendix 2: Squid response and hierarchy codes.

Response codes

The following codes are appended to the squid-generated, httpd style access logs. The TCP_ codes refer to requests on the HTTP port (usually 3128). The UDP_ codes refer to requests on the ICP port (usually 3130) and do not apply to the current WormBase configuration. These codes (in conjunction with the hierarchy codes listed below) describe how the cache handled the request.

TCP_HIT

A valid copy of the requested object was in the cache.

TCP_MISS

The requested object was not in the cache.

TCP_REFRESH_HIT

The requested object was cached but STALE. The IMS query for the object resulted in "304 not modified".

TCP_REF_FAIL_HIT

The requested object was cached but STALE. The IMS query failed and the stale object was delivered.

TCP_REFRESH_MISS

The requested object was cached but STALE. The IMS query returned the new content.

TCP_CLIENT_REFRESH_MISS

The client issued a "no-cache" pragma, or some analogous cache control command along with the request. Thus, the cache has to refetch the object.

TCP_IMS_HIT

The client issued an IMS request for an object which was in the cache and fresh.

TCP_SWAPFAIL_MISS

The object was believed to be in the cache, but could not be accessed.

TCP_NEGATIVE_HIT

Request for a negatively cached object, e.g. "404 not found", for which the cache believes to know that it is inaccessible. Also refer to the explainations for negative_ttl in your squid.conf file.

TCP_MEM_HIT

A valid copy of the requested object was in the cache and it was in memory, thus avoiding disk accesses.

TCP_DENIED

Access was denied for this request.

TCP_OFFLINE_HIT

The requested object was retrieved from the cache during offline mode. The offline mode never validates any object, see offline_mode in squid.conf file.

UDP_HIT

A valid copy of the requested object was in the cache.

UDP_MISS

The requested object is not in this cache.

UDP_DENIED

Access was denied for this request.

UDP_INVALID

An invalid request was received.

UDP_MISS_NOFETCH

During "-Y" startup, or during frequent failures, a cache in hit only mode will return either UDP_HIT or this code. Neighbours will thus only fetch hits.

NONE

Seen with errors and cachemgr requests.

Hierarchy codes

The following hierarchy codes are used with Squid-2. They convey information about how the request was handled.

NONE

For TCP HIT, TCP failures, cachemgr requests and all UDP requests, there is no hierarchy information.

DIRECT

The object was fetched from the origin server.

SIBLING_HIT

The object was fetched from a sibling cache which replied with UDP_HIT.

PARENT_HIT

The object was requested from a parent cache which replied with UDP_HIT.

DEFAULT_PARENT

No ICP queries were sent. This parent was chosen because it was marked ``default in the config file.

SINGLE_PARENT

The object was requested from the only parent appropriate for the given URL.

FIRST_UP_PARENT

The object was fetched from the first parent in the list of parents.

NO_PARENT_DIRECT

The object was fetched from the origin server, because no parents existed for the given URL.

FIRST_PARENT_MISS

The object was fetched from the parent with the fastest (possibly weighted) round trip time.

CLOSEST_PARENT_MISS

This parent was chosen, because it included the the lowest RTT measurement to the origin server. See also the closests-only peer configuration option.

CLOSEST_PARENT

The parent selection was based on our own RTT measurements.

CLOSEST_DIRECT

Our own RTT measurements returned a shorter time than any parent.

NO_DIRECT_FAIL

The object could not be requested because of a firewall configuration, see also never_direct and related material, and no parents were available.

SOURCE_FASTEST

The origin site was chosen, because the source ping arrived fastest.

ROUNDROBIN_PARENT

No ICP replies were received from any parent. The parent was chosen, because it was marked for round robin in the config file and had the lowest usage count.

CACHE_DIGEST_HIT

The peer was chosen, because the cache digest predicted a hit. This option was later replaced in order to distinguish between parents and siblings.

CD_PARENT_HIT

The parent was chosen, because the cache digest predicted a hit.

CD_SIBLING_HIT

The sibling was chosen, because the cache digest predicted a hit.

NO_CACHE_DIGEST_DIRECT

This output seems to be unused?

CARP

The peer was selected by CARP.

ANY_PARENT

part of src/peer_select.c:hier_strings[].

INVALID CODE