Difference between revisions of "General administration"
Line 268: | Line 268: | ||
== Appendices == | == Appendices == | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=== Appendix 2: Squid response and hierarchy codes. === | === Appendix 2: Squid response and hierarchy codes. === |
Revision as of 15:21, 31 July 2011
Contents
WormBase Administration
This document describes maintenance of the various hardware and software components of WormBase. This includes monitoring the health of the site, starting and stopping various servers, description of cron jobs that keep the site running, the location and contents of important log files, and what to do when things go wrong.
Document conventions
Commands that require super-user privileges are prefaced with "sudo" or a "$" prompt. If the system is configured correctly, you should not need to be a root user in order to update the site.
*** Potential stumbling blocks are indented and hilighted with a *** preceeding triple asterisk. Yikes!
Managing squid
Controlling the squid server
Test if the squid server is running:
fe> sudo /etc/rc.d/init.d/squid status
Start squid:
fe> sudo /etc/rc.d/init.d/squid start
Stop squid:
fe> sudo /etc/rc.d/init.d/squid stop
Reset the squid cache. This option will stop squid, reset the swap.state file, and restart squid. The end result is squid started anew with an emptyy cache.
fe> sudo /etc/rc.d/init.d/squid resetcache
Delete the squid httpd-style access_logs. Like reset cache, this option takes the server down briefly in order to rotate the squid logs.
fe> sudo /etc/rc.d/init.d/squid deletelogs
Reset the cache AND delete the access_logs. This command is typically used following a system update.
fe> sudo /etc/rc.d/init.d/squid fullreset
Test the squid configuration file (not necessary to bring the server down):
fe> sudo /etc/rc.d/init.d/squid parse
Reload the squid configuration file (for most changes, not necessary to bring the server down):
fe> sudo /etc/rc.d/init.d/squid reload
Monitoring squid
The installation of squid offers two options for monitoring the server.
The Cache Manager CGI
The Cache Manager CGI is only available from fe.wormbase.org:81. Note you must be "on localhost" in order to use this statistics viewer. You can access this CGI by tunneling port 81 traffic to fe.wormbase.org and then accessing the following URL:
http://localhost:81/squid/cachemgr.cgi
CacheHost  : fe.wormbase.org CachePort  : 80 Managername : [LSs canine] Password  : [none required]
RRD Tool / real-time graphical analysis
You can also monitor squid performance in real-time graphically using RRDTools. Again, these tools are located behind the firewall on port fe.wormbase.org:81.
1 day statistics: http://localhost:81/squid-monitor/1day.cgi
1 week statistics: http://localhost:81/squid-monitor/1week.cgi
Squid tips and tricks
- squid PID
The squid PID file is located at fe:/usr/local/squid/var/logs/squid.pid.
- squid_start
When squid launches, it first looks for the squid_start script adjacent to the squid binary. This is a useful place for storing additional administration tasks.
- Squid restarts with write-errors
If log files grow to large and squid can no longer write to them, it will restart.
- Purging objects from the cache
Occasionally, a corrupt file may be sstored in the cache (ie if a script has a bug that does not generate a server error).
To purge an object (ie URL) from the cache, use the squidclient binary.
*** You must be "on localhost" -- logged in to fe.wormbase.org to use *** this command.
/usr/local/squid/bin/squidclient -p [port] -m PURGE [url]
ie: squidclient -p 80 -h fe.wormbase.org -m PURGE \ http://www.wormbase.org/db/gene/gene?name=unc-26
Log files
The various servers at WormBase create a number of different logs.
Squid
Squid creates a large number of logs but only two are useful on a day-to-day basis.
- /usr/local/squid/var/logs/cache.log
This is the primary source of information on the health of the squid server. This log is also echoed to the system log /var/log/messages. This log grows very slowly if squid is running well; if squid is sick, watch for a tremendous growth in the size of this log!
- httpd-style access logs
Using the configuration described above, squid creates httpd-style logs at /usr/local/squid/logs/access_log. This logs are akin to the httpd logs found in /usr/local/wormbase/logs/access_log. They should be used for the primary log analysis.
*** Note that these logs are very similar to httpd access_logs with *** the addition of a single column. This column records the squid *** response and hierarchy code, showing how the request was handled.
Acedb
Acedb produces two logs. These are both located in the "database" folder of the current database.
*** Both of these logs MUST be writable by the acedb user! If they *** aren't, the database will not be able to start. This is not *** always obvious - look for cycling xinetd requests attempting to *** launch the database in /var/log/messages.
- log.wrm
This slowly growing log contains information on the status of the server.
- serverlog.wrm
This quickly growing log records queries to the database. It is really only useful when trying to debug slow queries.
*** If serverlog.wrm reaches 2 GB in size, acedb may not be able to *** write to the file. This causes the database to crash, and xinetd *** to fail in launching it! Don't let this happen (see below)!
Apache
Apache creates an error log and access log that can be used for debugging and watching direct requests to back end origin servers.
- /usr/local/wormbase/logs/access_log
A record of all direct requests to the httpd origin server.
*** Since squid intercepts all requests -- and serves many directly *** from the cache -- the httpd access_log is not a true indicator of *** access statistics.
- /usr/local/wormbase/logs/error_log
Errors encountered during execution of requests at the origin server.
Cron jobs
This section describes cron jobs that are used to keep WormBase running smoothly.
*** Some jobs are specific to certain machines. The appropriate *** machine is indicated in parenthesis when appropriate.
*** All cron jobs shown here should be entered in the root crontab *** unless otherwise indicated.
Monitoring
- RRD-Tools / graphical analysis of squid logs (fe.wormbase.org)
*/5 * * * * /usr/local/apache/htdocs/squid-monitor/poll.pl fe.wormbase.org:80
Update the RRDTool-based graphs at 5 minute intervals. These statistics can be viewed ONLY from localhost on fe.wormbase.org at:
http://localhost:81/squid-monitor/1day.cgi
Log rotation
The following cron entries keep the log files in check.
- squid (fe.wormbase.org)
# Rotate the squid httpd-style access logs once per day at 2 AM # This cron job is only required on fe.wormbase.org 0 2 * * * /usr/local/wormbase-admin/log_maintenance/rotation/rotate_squid_logs.pl
- acedb (unc, vab, crestone)
The Acedb serverlog.wrm grows to epic proportions and rarely contains useful (ie on a day-to-day basis) information. And when the logs grow very large, even log rotation can become painfully slow and even crash the server. Instead, we purge the logs by writing a single bit to the file with the following cron job.
# Reset the serverlog.wrm once an hour 30 * * * * /usr/local/wormbase-admin/log_maintenance/rotation/purge_acedb_logs.pl
Synchronization
- Check for new releases of the database / update software
The satellite servers are managed by a script that utilizes Bio::GMOD. This script - gmod_update_installation-wormbase.pl - checks for a new live release of the databsae. If present, it fetches and downloads the databases from the development server and installs them. The WormBase software on the server is synchronized to a checked out version of the source regardless of whether there is a new database present.
gmod_update_installation-wormbase.pl relies on a configuration file that should be specified for each server. Typically, it is sufficient to check for new updates twice a day.
0 1,13 * * * /usr/local/bin/gmod_update_installation-wormbase.pl \ --config /path/to/config/file
See updating_wormbase.pod for additional details.
- Maintain software rsync modules (brie3)
brie3 (dev.wormbase.org) hosts an rsync module of the WormBase software. This module is used to to keep mirror sites and production nodes up-to-date.
NOTE: THIS IS NOW DEPRECATED (2/2/2006) IN FAVOR OF A MORE SIMPLIFIED PROCEDURE.
This following cronjob creates a cvs export of the "stable" tag on the main development branch on a nightly. Select files are purged from this export that may inadvertently break mirrors and production nodes. For convenience, the module resides on the FTP server in /usr/local/wormbase/ftp/pub/wormbase/mirror.
The "wormbase-live" symlink always points to the most current version.
0 3 * * * /usr/local/wormbase-admin/update_scripts/export_stable_software.pl
See "wormbase-admin/docs/updating_wormbase.pod" for how to create the stable tag in the repository.
- Synchronize release notes with the checked out source (brie3)
This cron job ensures that release notes distributed by sanger are kept in the production html/ directory. This rather inane task should probably just be handled once by the update process.
0 1 * * * /usr/local/wormbase-admin/update_scrips/rsync_misc.pl
Troubleshooting / when things go wrong
- Symptom: hardware crash of fe.wormbase.org
In the event that fe.wormbase.org suffers a catastrophic hardware failure, DNS entries can be modified to point www -> unc.wormbase.org. Once this change percolates through the system, WormBase will be restored.
- Symptom: periodic success/failures of a single page
Because of the distributed nature of the WormBase infrastructure, troubleshooting problems is more complicated than a single-server installation. In particular, if a single requests succeeds and fails with about similar frequency, one of the origin servers may be down.
Check the following things in this order:
- Is Acedb running and responsive on the origin servers?
- Is Mysqld running and responsinve on the origin servers?
- Is httpd running and responsive on the origin servers?
Check that httpd is running on all back-end origin servers. If it is not, ensure that there is sufficient disk space for logging and restart apache by:
brie6> sudo /usr/local/apache/bin/apachectl start
- Is squid running and responsive on fe.wormbase.org?
The most unlikely event is that squid itself may have crashed on the origin server. You can check this by:
fe> sudo /etc/rc.d/init.d/squid status
If squid is down, ensure that there is sufficient disk space and that the log directories are writable by the squid user (if these conditions are not met, squid will exit and send errors to /usr/local/squid/var/logs/cache.log). Restart squid by,
fe> sudo /etc/rc.d/init.d/squid start
Appendices
Appendix 2: Squid response and hierarchy codes.
Response codes
The following codes are appended to the squid-generated, httpd style access logs. The TCP_ codes refer to requests on the HTTP port (usually 3128). The UDP_ codes refer to requests on the ICP port (usually 3130) and do not apply to the current WormBase configuration. These codes (in conjunction with the hierarchy codes listed below) describe how the cache handled the request.
- TCP_HIT
A valid copy of the requested object was in the cache.
- TCP_MISS
The requested object was not in the cache.
- TCP_REFRESH_HIT
The requested object was cached but STALE. The IMS query for the object resulted in "304 not modified".
- TCP_REF_FAIL_HIT
The requested object was cached but STALE. The IMS query failed and the stale object was delivered.
- TCP_REFRESH_MISS
The requested object was cached but STALE. The IMS query returned the new content.
- TCP_CLIENT_REFRESH_MISS
The client issued a "no-cache" pragma, or some analogous cache control command along with the request. Thus, the cache has to refetch the object.
- TCP_IMS_HIT
The client issued an IMS request for an object which was in the cache and fresh.
- TCP_SWAPFAIL_MISS
The object was believed to be in the cache, but could not be accessed.
- TCP_NEGATIVE_HIT
Request for a negatively cached object, e.g. "404 not found", for which the cache believes to know that it is inaccessible. Also refer to the explainations for negative_ttl in your squid.conf file.
- TCP_MEM_HIT
A valid copy of the requested object was in the cache and it was in memory, thus avoiding disk accesses.
- TCP_DENIED
Access was denied for this request.
- TCP_OFFLINE_HIT
The requested object was retrieved from the cache during offline mode. The offline mode never validates any object, see offline_mode in squid.conf file.
- UDP_HIT
A valid copy of the requested object was in the cache.
- UDP_MISS
The requested object is not in this cache.
- UDP_DENIED
Access was denied for this request.
- UDP_INVALID
An invalid request was received.
- UDP_MISS_NOFETCH
During "-Y" startup, or during frequent failures, a cache in hit only mode will return either UDP_HIT or this code. Neighbours will thus only fetch hits.
- NONE
Seen with errors and cachemgr requests.
Hierarchy codes
The following hierarchy codes are used with Squid-2. They convey information about how the request was handled.
- NONE
For TCP HIT, TCP failures, cachemgr requests and all UDP requests, there is no hierarchy information.
- DIRECT
The object was fetched from the origin server.
- SIBLING_HIT
The object was fetched from a sibling cache which replied with UDP_HIT.
- PARENT_HIT
The object was requested from a parent cache which replied with UDP_HIT.
- DEFAULT_PARENT
No ICP queries were sent. This parent was chosen because it was marked ``default in the config file.
- SINGLE_PARENT
The object was requested from the only parent appropriate for the given URL.
- FIRST_UP_PARENT
The object was fetched from the first parent in the list of parents.
- NO_PARENT_DIRECT
The object was fetched from the origin server, because no parents existed for the given URL.
- FIRST_PARENT_MISS
The object was fetched from the parent with the fastest (possibly weighted) round trip time.
- CLOSEST_PARENT_MISS
This parent was chosen, because it included the the lowest RTT measurement to the origin server. See also the closests-only peer configuration option.
- CLOSEST_PARENT
The parent selection was based on our own RTT measurements.
- CLOSEST_DIRECT
Our own RTT measurements returned a shorter time than any parent.
- NO_DIRECT_FAIL
The object could not be requested because of a firewall configuration, see also never_direct and related material, and no parents were available.
- SOURCE_FASTEST
The origin site was chosen, because the source ping arrived fastest.
- ROUNDROBIN_PARENT
No ICP replies were received from any parent. The parent was chosen, because it was marked for round robin in the config file and had the lowest usage count.
- CACHE_DIGEST_HIT
The peer was chosen, because the cache digest predicted a hit. This option was later replaced in order to distinguish between parents and siblings.
- CD_PARENT_HIT
The parent was chosen, because the cache digest predicted a hit.
- CD_SIBLING_HIT
The sibling was chosen, because the cache digest predicted a hit.
- NO_CACHE_DIGEST_DIRECT
This output seems to be unused?
- CARP
The peer was selected by CARP.
- ANY_PARENT
part of src/peer_select.c:hier_strings[].
- INVALID CODE
part of src/peer_select.c:hier_strings[].
Almost any of these may be preceded by 'TIMEOUT_' if the two-second (default) timeout occurs waiting for all ICP replies to arrive from neighbors, see also the icp_query_timeout configuration option.
See Also
updating_wormbase.pod - The complete guide for updating WormBase
wormbase_infrastructure.pod - The WormBase hardware and software infrastructure
Author
Tharris 13:42, 17 November 2005 (EST)
Original attribution Author: Todd Harris (harris@cshl.edu) $Id: wormbase_administration.pod,v 1.2 2005/08/31 20:23:36 todd Exp $ Copyright @ 2004-2005 Cold Spring Harbor Laboratory