Difference between revisions of "Design Specs: Database Connectivity"

From WormBaseWiki
Jump to navigationJump to search
 
(46 intermediate revisions by one other user not shown)
Line 4: Line 4:
 
databases (multiple instances of the same database in some cases).
 
databases (multiple instances of the same database in some cases).
 
Should be fault tolerant if a server or database crashes.
 
Should be fault tolerant if a server or database crashes.
 +
 +
==Implementation==
 +
 +
http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Role/Service.pm
 +
a Moose Role module which defines the common attributes and behavior for different databases.
 +
 +
eg.
 +
 +
'hosts' array reference attribute which has all the available hosts from configuration file.
 +
 +
reconnect: abstract method for the database connection, fail-over is also implementation here:
 +
*use Moose 'around' method modifier to check on the db handle every time it is accessed
 +
 
 +
  around 'dbh' sub {}
 +
 +
Note that attribute accessors also count as methods for the purposes of satisfying the requirements of a role.
 +
 +
*do "reconnect" if dbh is not set, undef or time-out(AceDB)
 +
* will try to connect for maximum $ReconnectMaxTries(now this is hard coded as 5, will replace by conf) times
 +
* host is picked randomly among the pool
 +
 
 +
  my $host = $self->hosts->[ rand @{$self->hosts} ];
 +
 +
This Role is consumed by each Database Classes(Service module) 
 +
*http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Service/acedb.pm
 +
*http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Service/gff.pm 
 +
 +
each has its own "connect" method and "ping" method(mostly just a wrapper)
 +
<pre>
 +
AceDB
 +
use Ace;
 +
sub connect {
 +
    my $self = shift;
 +
    my ($host,$port,$user,$pass)=@_;
 +
    return Ace->connect(-host => $host,
 +
      -port => $self->port,
 +
      -user=>$self->user,
 +
      -pass=>$self->pass,
 +
    );
 +
    #   @cache);
 +
}
 +
 +
GFF database
 +
use Bio::DB::GFF;
 +
sub connect {
 +
    my $self = shift;
 +
    my ($host,$port,$user,$pass)=@_;
 +
    return Bio::DB::GFF->new( -user => $user,
 +
      -pass => $pass,
 +
      -dsn => "dbi:mysql:database=".$self->species.";host=" . $host,
 +
    );
 +
}
 +
</pre>
 +
 +
Minor changes in main API http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API.pm
 +
* took out the BUILD subroutine
 +
* modified gff_dsn() method
 +
 +
==Testing==
 +
 +
prove -lv t/WormBase/API/Object/Gene.t  http://bitbucket.org/tharris/wormbase/src/tip/t/WormBase/API/Object/Gene.t
 +
 +
<pre>
 +
ok 1 - use WormBase::API;
 +
ok 2 - Constructed WormBase::API object ok
 +
[DEBUG] acedb dbh for specie  doesn't exist or is not alive; trying to connect
 +
[INFO] trytime 1: Connecting to  acedb
 +
[DEBUG]      using the following parameters:
 +
[DEBUG]        aceserver.cshl.edu:2005
 +
[INFO]    --> succesfully established connection to  acedb on aceserver.cshl.edu
 +
[DEBUG] acedb dbh for specie  exists and is alive!
 +
ok 3 - The object isa WormBase::API::Object::Gene
 +
...
 +
...
 +
[DEBUG] gff dbh for specie c_elegans doesn't exist or is not alive; trying to connect
 +
[INFO] trytime 1: Connecting to  gff
 +
[DEBUG]      using the following parameters:
 +
Use of uninitialized value in concatenation (.) or string at /home/xshi/work/project/wormbase/website-2.0/lib/WormBase/API/Role/Service.pm line 106.
 +
[DEBUG]        aceserver.cshl.edu:
 +
[INFO]    --> succesfully established connection to  gff on aceserver.cshl.edu
 +
[DEBUG] gff dbh for specie c_elegans exists and is alive!
 +
[DEBUG] gff dbh for specie c_elegans exists and is alive!
 +
...
 +
</pre>
 +
 +
==Question==
 +
When Ace time out would the dbh becomes undef?
 +
How useful is the Ace reopen method? For now I just re-connect.
 +
 +
==Continued Work==
 +
 +
setting up attributes such as hosts, user, password, caching for AceDB etc through Configuration file.
 +
 +
  Catalyst::Plugin::ConfigLoader
 +
 +
 +
use Ace (); to prevent eg. the connect sub from Ace being imported into our namespaces.
 +
 +
==Things to think about==
 +
===persistent connections===
 +
 +
Apache::DBI        DBI  connect_cached    DBIx::Class
 +
 +
Also, because Apache::DBI's connection caching never checks to see if
 +
the process and/or thread id changed, it will give you an invalid $dbh
 +
from a previous process/thread if you use a connection, then
 +
thread/fork, then try to re-connect to the same source, which is a Bad
 +
Thing.
 +
 +
DBIx::Class already manages its connections for you, and therefore it
 +
cannot benefit from Apache::DBI under any scenario.  It makes one
 +
connection per-process, and keeps that connection persistent,
 +
reconnecting only if the connection appears to have died, or if you
 +
fork/thread over to another process/thread-id.  The only Apache::DBI
 +
issue in DBIx::Class is that Apache::DBI will actually thwart
 +
DBIx::Class's connection management code, and cause it to use the same
 +
(and invalid) connection in a new process, in cases such as (as
 +
mentioned above) if you make a DBI connection before forking in a
 +
prefork mod_perl server.
  
 
==Resources==
 
==Resources==
Line 12: Line 131:
  
 
Currently, each database is a "Service".  When the API is instantiated, connections are made to each required service.  Database handles are cached.  When a database handle is needed, the service first checks to see if it is live, if not it refreshes the connection.  This would probably be the ideal location to implement failover -- check the current connection, if dead, try a different database server.
 
Currently, each database is a "Service".  When the API is instantiated, connections are made to each required service.  Database handles are cached.  When a database handle is needed, the service first checks to see if it is live, if not it refreshes the connection.  This would probably be the ideal location to implement failover -- check the current connection, if dead, try a different database server.
 +
 
 +
*Note that attribute accessors also count as methods for the purposes of satisfying the requirements of a role.
 +
 +
http://bitbucket.org/tharris/wormbase/src/9009075c4c5a/lib/WormBase/API/Role/Service/gff.pm
 +
<pre>
  
 +
  around 'dbh' => sub {
 +
    my $orig = shift;
 +
    my $self = shift;
 +
   
 +
    my $species = $self->species;
 +
   
 +
    # Do we already have a dbh? HOW TO TEST THIS WITH HASH REF?
 +
    if ($self->has_dbh) {
 +
$self->log->debug("    gff-dbh for $species exists and is alive!");
 +
return $self->$orig;
 +
    } else {
 +
$self->log->debug("    gff-dbh for $species doesn't exist yet; trying to connect");
 +
my $dbh = $self->connect($species);
 +
    }
 +
};
 +
</pre>
 +
 +
 
 
Note also that these Services use Moose "Roles" to define shared methods and variables like connect().
 
Note also that these Services use Moose "Roles" to define shared methods and variables like connect().
  
Line 24: Line 166:
  
 
This nomenclature and structure is a bit confusing and probably needs to be streamlined/refined!!
 
This nomenclature and structure is a bit confusing and probably needs to be streamlined/refined!!
 +
 +
 +
 +
[[Category:Developer documentation]]

Latest revision as of 18:48, 11 August 2010

Objectives

Part of the WormBase::API. Needs to be able to connect to multiple databases (multiple instances of the same database in some cases). Should be fault tolerant if a server or database crashes.

Implementation

http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Role/Service.pm a Moose Role module which defines the common attributes and behavior for different databases.

eg.

'hosts' array reference attribute which has all the available hosts from configuration file.

reconnect: abstract method for the database connection, fail-over is also implementation here:

  • use Moose 'around' method modifier to check on the db handle every time it is accessed
  around 'dbh' sub {}

Note that attribute accessors also count as methods for the purposes of satisfying the requirements of a role.

  • do "reconnect" if dbh is not set, undef or time-out(AceDB)
  • will try to connect for maximum $ReconnectMaxTries(now this is hard coded as 5, will replace by conf) times
  • host is picked randomly among the pool
  my $host = $self->hosts->[ rand @{$self->hosts} ];

This Role is consumed by each Database Classes(Service module)

each has its own "connect" method and "ping" method(mostly just a wrapper)

AceDB
use Ace;
sub connect {
    my $self = shift;
    my ($host,$port,$user,$pass)=@_;
    return Ace->connect(-host => $host,
			      -port => $self->port,
			      -user=>$self->user,
			      -pass=>$self->pass,
			    );
    #			   @cache);
}

GFF database
use Bio::DB::GFF;
sub connect {
    my $self = shift;
    my ($host,$port,$user,$pass)=@_;
    return Bio::DB::GFF->new( -user => $user,
			      -pass => $pass,
			      -dsn => "dbi:mysql:database=".$self->species.";host=" . $host,
    );
}

Minor changes in main API http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API.pm

  • took out the BUILD subroutine
  • modified gff_dsn() method

Testing

prove -lv t/WormBase/API/Object/Gene.t  http://bitbucket.org/tharris/wormbase/src/tip/t/WormBase/API/Object/Gene.t
ok 1 - use WormBase::API;
ok 2 - Constructed WormBase::API object ok
[DEBUG] acedb dbh for specie  doesn't exist or is not alive; trying to connect
[INFO] trytime 1: Connecting to  acedb
[DEBUG]      using the following parameters:
[DEBUG]        aceserver.cshl.edu:2005
[INFO]    --> succesfully established connection to  acedb on aceserver.cshl.edu
[DEBUG] acedb dbh for specie  exists and is alive!
ok 3 - The object isa WormBase::API::Object::Gene
...
...
[DEBUG] gff dbh for specie c_elegans doesn't exist or is not alive; trying to connect
[INFO] trytime 1: Connecting to  gff
[DEBUG]      using the following parameters:
Use of uninitialized value in concatenation (.) or string at /home/xshi/work/project/wormbase/website-2.0/lib/WormBase/API/Role/Service.pm line 106.
[DEBUG]        aceserver.cshl.edu:
[INFO]    --> succesfully established connection to  gff on aceserver.cshl.edu
[DEBUG] gff dbh for specie c_elegans exists and is alive!
[DEBUG] gff dbh for specie c_elegans exists and is alive!
...

Question

When Ace time out would the dbh becomes undef? How useful is the Ace reopen method? For now I just re-connect.

Continued Work

setting up attributes such as hosts, user, password, caching for AceDB etc through Configuration file.

 Catalyst::Plugin::ConfigLoader


use Ace (); to prevent eg. the connect sub from Ace being imported into our namespaces.

Things to think about

persistent connections

Apache::DBI DBI connect_cached DBIx::Class

Also, because Apache::DBI's connection caching never checks to see if the process and/or thread id changed, it will give you an invalid $dbh from a previous process/thread if you use a connection, then thread/fork, then try to re-connect to the same source, which is a Bad Thing.

DBIx::Class already manages its connections for you, and therefore it cannot benefit from Apache::DBI under any scenario. It makes one connection per-process, and keeps that connection persistent, reconnecting only if the connection appears to have died, or if you fork/thread over to another process/thread-id. The only Apache::DBI issue in DBIx::Class is that Apache::DBI will actually thwart DBIx::Class's connection management code, and cause it to use the same (and invalid) connection in a new process, in cases such as (as mentioned above) if you make a DBI connection before forking in a prefork mod_perl server.

Resources

The current connectivity code is part of the API. The API, in turn, is inside of the lib directory of the WormBase mercurial repository:

http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Service/

Currently, each database is a "Service". When the API is instantiated, connections are made to each required service. Database handles are cached. When a database handle is needed, the service first checks to see if it is live, if not it refreshes the connection. This would probably be the ideal location to implement failover -- check the current connection, if dead, try a different database server.

  • Note that attribute accessors also count as methods for the purposes of satisfying the requirements of a role.

http://bitbucket.org/tharris/wormbase/src/9009075c4c5a/lib/WormBase/API/Role/Service/gff.pm


  around 'dbh' => sub {
    my $orig = shift;
    my $self = shift;
    
    my $species = $self->species;
    
    # Do we already have a dbh? HOW TO TEST THIS WITH HASH REF?
    if ($self->has_dbh) {
	$self->log->debug("     gff-dbh for $species exists and is alive!");
	return $self->$orig;
    } else {
	$self->log->debug("     gff-dbh for $species doesn't exist yet; trying to connect");
	my $dbh = $self->connect($species);
    }
 };


Note also that these Services use Moose "Roles" to define shared methods and variables like connect().

Here's the acedb service:

http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Service/acedb.pm

The acedb service "consumes" the generic Service role:

http://bitbucket.org/tharris/wormbase/src/9009075c4c5a/lib/WormBase/API/Role/Service.pm

This nomenclature and structure is a bit confusing and probably needs to be streamlined/refined!!