Difference between revisions of "Design Specs: Database Connectivity"

From WormBaseWiki
Jump to navigationJump to search
Line 11: Line 11:
 
eg.
 
eg.
  
'all_host': attribute which has all the available hosts from configuration file.
+
'all_host': array attribute which has all the available hosts from configuration file.
  
 
connect: abstract method for the database connection, fail-over is implementation here!
 
connect: abstract method for the database connection, fail-over is implementation here!
   
+
  $host = $array[ rand @array ];
  
 
This Role is consumed by each Database Classes(Service module)  API/Service/acedb.pm & API/Service/gff.pm   
 
This Role is consumed by each Database Classes(Service module)  API/Service/acedb.pm & API/Service/gff.pm   
Line 28: Line 28:
 
Bio::DB::GFF->new(%$gff_args)
 
Bio::DB::GFF->new(%$gff_args)
 
</pre>
 
</pre>
 +
 +
==Things to think about==
 +
persistent connections
 +
Apache::DBI        DBI  connect_cached    DBIx::Class
 +
 +
Also, because Apache::DBI's connection caching never checks to see if
 +
the process and/or thread id changed, it will give you an invalid $dbh
 +
from a previous process/thread if you use a connection, then
 +
thread/fork, then try to re-connect to the same source, which is a Bad
 +
Thing.
 +
 +
DBIx::Class already manages its connections for you, and therefore it
 +
cannot benefit from Apache::DBI under any scenario.  It makes one
 +
connection per-process, and keeps that connection persistent,
 +
reconnecting only if the connection appears to have died, or if you
 +
fork/thread over to another process/thread-id.  The only Apache::DBI
 +
issue in DBIx::Class is that Apache::DBI will actually thwart
 +
DBIx::Class's connection management code, and cause it to use the same
 +
(and invalid) connection in a new process, in cases such as (as
 +
mentioned above) if you make a DBI connection before forking in a
 +
prefork mod_perl server.
  
 
==Resources==
 
==Resources==

Revision as of 15:26, 17 February 2010

Objectives

Part of the WormBase::API. Needs to be able to connect to multiple databases (multiple instances of the same database in some cases). Should be fault tolerant if a server or database crashes.

Plans

API/Role/Service.pm: a Moose Role module which defines the common attributes and behavior for different databases.

eg.

'all_host': array attribute which has all the available hosts from configuration file.

connect: abstract method for the database connection, fail-over is implementation here!

$host = $array[ rand @array ];

This Role is consumed by each Database Classes(Service module) API/Service/acedb.pm & API/Service/gff.pm

different database will use different modules for connection purpose, here we have

AceDB
use Ace;
Ace->connect

GFF database
use Bio::DB::GFF;
Bio::DB::GFF->new(%$gff_args)

Things to think about

persistent connections Apache::DBI DBI connect_cached DBIx::Class

Also, because Apache::DBI's connection caching never checks to see if the process and/or thread id changed, it will give you an invalid $dbh from a previous process/thread if you use a connection, then thread/fork, then try to re-connect to the same source, which is a Bad Thing.

DBIx::Class already manages its connections for you, and therefore it cannot benefit from Apache::DBI under any scenario. It makes one connection per-process, and keeps that connection persistent, reconnecting only if the connection appears to have died, or if you fork/thread over to another process/thread-id. The only Apache::DBI issue in DBIx::Class is that Apache::DBI will actually thwart DBIx::Class's connection management code, and cause it to use the same (and invalid) connection in a new process, in cases such as (as mentioned above) if you make a DBI connection before forking in a prefork mod_perl server.

Resources

The current connectivity code is part of the API. The API, in turn, is inside of the lib directory of the WormBase mercurial repository:

http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Service/

Currently, each database is a "Service". When the API is instantiated, connections are made to each required service. Database handles are cached. When a database handle is needed, the service first checks to see if it is live, if not it refreshes the connection. This would probably be the ideal location to implement failover -- check the current connection, if dead, try a different database server.


http://bitbucket.org/tharris/wormbase/src/9009075c4c5a/lib/WormBase/API/Role/Service/gff.pm


  around 'dbh' => sub {
    my $orig = shift;
    my $self = shift;
    
    my $species = $self->species;
    
    # Do we already have a dbh? HOW TO TEST THIS WITH HASH REF?
    if ($self->has_dbh) {
	$self->log->debug("     gff-dbh for $species exists and is alive!");
	return $self->$orig;
    } else {
	$self->log->debug("     gff-dbh for $species doesn't exist yet; trying to connect");
	my $dbh = $self->connect($species);
    }
 };


Note also that these Services use Moose "Roles" to define shared methods and variables like connect().

Here's the acedb service:

http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Service/acedb.pm

The acedb service "consumes" the generic Service role:

http://bitbucket.org/tharris/wormbase/src/9009075c4c5a/lib/WormBase/API/Role/Service.pm

This nomenclature and structure is a bit confusing and probably needs to be streamlined/refined!!