Design Specs: Scalability

From WormBaseWiki
Jump to navigationJump to search

Objectives

To build a robust and scalable platform for hosting the next generation of the WormBase website.

Reported Issiues (bitbucket.org)

Limitations of the Current System

As of Jan 2010, the WormBase website resides on a heterogeneous cluster of web and database servers. A single server running the squid reverse proxy software load balances across these servers.

Although this system scales easily and has served WormBase well for six years, it is not ideal for the following reasons.

1. Multiple single points of redundancy that require manual intervention to resolve

2. Difficulty of configuration and management of reverse proxy software

3. Lack of true fault tolerance in system

4. Heterogeneity complicates management, backups, production releases


Page and Page Fragment Caching

CHI/PageCache or via Template Toolkit.

Catalyst Modules:

Catalyst::Plugin::Cache
Catalyst::Plugin::PageCache
Mailing list discussion: [Catalyst] Page fragment caching From Jan 20, 2010
use Digest::SHA1 'sha1_hex';
use base 'Catalyst::Controller';

sub foo : Local {
 my ($self, $c) = @_;

my $name = $c->req->params->{name};
my $age = $c->req->params->{age};
my $page = $c->req->params->{page};

 my $hash = sha1_hex('foo', $name, $age, $page);
 my $cache = $c->cache;
 my $page_part = $cache->get($hash);

unless ($page_part) {
#Here make the selection from the database, get the data from somewhere else...
my $object = $c->model("DB::Table")->search({
name => {-like => "%$name%"},
age => $age,
},{
order_by => 'name',
rows => 20,
page => $page,
});

$c->stash(obj => $obj);
$page_part = $c->view('TT')->render($c, 'template/for/that/part/of/the/page.tt');

$cache->set($hash, $page_part, 600);
}

$c->stash(page_part => $page_part);
}


This way it works fine, although if someone has an easier and/or nicer solution, please tell us.

PageCache has some conflicts with another plugin, if I remember well is Catalyst::Plugin::Unicode.

Template::Plugin::Cache that can be used for page parts caching very easy doesn't work well with templates that contain special UTF-8 chars (I was informed that this is because the FileCache module and I heard that it could work with a CHI object).

Anyway, if somebody has some recommendations for speed improvements with Catalyst, please tell us if there are (I have checked a PHP page with many dynamic types of data that have a 44 requests/second, but I wasn't able to create a much simpler page with Catalyst that displays so fast).
Yes, I know, the PHP page is low level, harder to create and to maintain, gives bad errors, but I can't convince that Catalyst is good if it is not fast enough, so any tips for using caches or other ways of improving the speed are welcome.


> If you are using Template Toolkit,
>
> http://search.cpan.org/dist/Template-Plugin-Cache/

Agree, this will do what you want.  To have it use your Catalyst cache,
just do something like this:

  USE cache = Cache('cache' => c.cache);

You can use it to cache either a whole page or just part of one, since
you can "cache.inc()" or "cache.proc()" to INCLUDE or PROCESS any
template (or BLOCK? not sure...) from inside another one.



Catalyst::Plugin::Cache will happily use CHI, or you can bind CHI in as a model, whatever..

I don't see why there needs to be anything CHI specific?

(But I'd welcome doc patches / wiki / advent articles showing how to use CHI).

Page Caching and Delivery via eTags or md5sums

Hi Catalysters,

For some actions of a Catalyst app, I would like to implement conditional GET (using If-Modified-Since HTTP header), where the timestamp of one config file decides whether the page should be refreshed or not  --- this is because that page is quite expensive to compute.

This scenario sounds like a common thing to do, so I expected to find some Catalyst plugins/extensions in CPAN to do that, but didn't find any. Did I just miss some CPAN modules, or should I really start from scratch ?

Thanks in advance for any hints,

Laurent Dami
-- 
     
There's a good example using the 'Cache-Control' header in the new
Catalyst book, Chapter 11, section 'Deploy with a Cache'.

Kiffin Gish <kiffin.gish@planet.nl>
Gouda, The Netherlands
--

Octavian Rasnita	

Try Catalyst::Plugin::Cache::HTTP.
--
Aristotle Pagaltzis	

I agree with the others who have responded, but they didn’t
explain why theirs was the right answer, so:

You’re not checking whether any state has changed since the
last GET to decide whether to recompute. That is the case in
which conditional GET would be appropriate. That would allow
you to avoid recomputing the page indefinitely as long as no
state changes necessitate it, but it requires that the clients
keep asking.

Instead you merely want to avoid doing any recomputation for
some predefined period of time, regardless of your state. This
is a case for caching: since you aren’t going to recompute the
page until said time has passed, you may as well tell the client
that it’s superfluous for them to try asking again before that
period is up.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

--

Dami Laurent (PJ)	
    
Indeed, this is exactly what I want to do. The app has a config file (not a Catalyst
config file, but another file having to do with business logic), and some super-users
have a mechanism for hot uploading of a new config to the server, at any time.
A few app pages are expensive to compute, and they depend on the client and on that config file. So clients should keep asking for those pages at each request, and depending on the If-Modified-Since header and on the timestamp for the config file, the server can decide if it's worth recomputing the page for that client, or rather send a cheap 304 Not Modified.

Cheers, L. Dami

--
Aristotle Pagaltzis	

I suggest you send an ETag and check `If-None-Match` (possibly
just a hash of the timestamp for the config file) instead of
(or if you have HTTP/1.0 clients, in addition to) relying on the
timestamp and `If-Modified-Since`.

Beyond that:

It’s easy to write a module that will conserve bandwidth using
these headers, by hashing the body after all computation is done
and checking whether to send it or just a 304.

But conserving server CPU requires intimate knowledge of both the
model and the structure of the controllers. It seems hard to find
a generically useful abstraction beyond a utility method or two
for setting the headers.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>
--

Bill Moseley
Be careful about using timestamps if you are running multiple web servers behind a load balancer (or may expand to where you will be behind a balancer).  Here's a read on Etags:

 http://developer.yahoo.net/blog/archives/2007/07/high_performanc_11.html

For resources such as css, js, images I tend to create URLs that include an md5.  Those include cache headers that don't expire and thus when the content changes the URL changes. 

I have also done that with text/html pages, but it's less common.  For a config file you can send the config through Object::Signature to get an md5.  You could recalculate and cache that whenever a new config is uploaded.

For "static" pages (for non-logged in users) the pages tend to get cached for some number of minutes as it's not critical that a change is seen exactly the same time by all users.  Dynamic content is not cached, of course, but elements of the page may be cached in memcached.

-- 
Bill Moseley
moseley@hank.org