Difference between revisions of "Design Specs: Database Connectivity"

Latest revision as of 18:48, 11 August 2010

Objectives

Part of the WormBase::API. Needs to be able to connect to multiple databases (multiple instances of the same database in some cases). Should be fault tolerant if a server or database crashes.

Implementation

http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Role/Service.pm a Moose Role module which defines the common attributes and behavior for different databases.

eg.

'hosts' array reference attribute which has all the available hosts from configuration file.

reconnect: abstract method for the database connection, fail-over is also implementation here:

use Moose 'around' method modifier to check on the db handle every time it is accessed

  around 'dbh' sub {}

Note that attribute accessors also count as methods for the purposes of satisfying the requirements of a role.

do "reconnect" if dbh is not set, undef or time-out(AceDB)
will try to connect for maximum $ReconnectMaxTries(now this is hard coded as 5, will replace by conf) times
host is picked randomly among the pool

  my $host = $self->hosts->[ rand @{$self->hosts} ];

This Role is consumed by each Database Classes(Service module)

each has its own "connect" method and "ping" method(mostly just a wrapper)

AceDB
use Ace;
sub connect {
    my $self = shift;
    my ($host,$port,$user,$pass)=@_;
    return Ace->connect(-host => $host,
			      -port => $self->port,
			      -user=>$self->user,
			      -pass=>$self->pass,
			    );
    #			   @cache);
}

GFF database
use Bio::DB::GFF;
sub connect {
    my $self = shift;
    my ($host,$port,$user,$pass)=@_;
    return Bio::DB::GFF->new( -user => $user,
			      -pass => $pass,
			      -dsn => "dbi:mysql:database=".$self->species.";host=" . $host,
    );
}

Minor changes in main API http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API.pm

took out the BUILD subroutine
modified gff_dsn() method

Testing

prove -lv t/WormBase/API/Object/Gene.t  http://bitbucket.org/tharris/wormbase/src/tip/t/WormBase/API/Object/Gene.t

ok 1 - use WormBase::API;
ok 2 - Constructed WormBase::API object ok
[DEBUG] acedb dbh for specie  doesn't exist or is not alive; trying to connect
[INFO] trytime 1: Connecting to  acedb
[DEBUG]      using the following parameters:
[DEBUG]        aceserver.cshl.edu:2005
[INFO]    --> succesfully established connection to  acedb on aceserver.cshl.edu
[DEBUG] acedb dbh for specie  exists and is alive!
ok 3 - The object isa WormBase::API::Object::Gene
...
...
[DEBUG] gff dbh for specie c_elegans doesn't exist or is not alive; trying to connect
[INFO] trytime 1: Connecting to  gff
[DEBUG]      using the following parameters:
Use of uninitialized value in concatenation (.) or string at /home/xshi/work/project/wormbase/website-2.0/lib/WormBase/API/Role/Service.pm line 106.
[DEBUG]        aceserver.cshl.edu:
[INFO]    --> succesfully established connection to  gff on aceserver.cshl.edu
[DEBUG] gff dbh for specie c_elegans exists and is alive!
[DEBUG] gff dbh for specie c_elegans exists and is alive!
...

Question

When Ace time out would the dbh becomes undef? How useful is the Ace reopen method? For now I just re-connect.

Continued Work

setting up attributes such as hosts, user, password, caching for AceDB etc through Configuration file.

 Catalyst::Plugin::ConfigLoader

use Ace (); to prevent eg. the connect sub from Ace being imported into our namespaces.

Things to think about

persistent connections

Apache::DBI DBI connect_cached DBIx::Class

Also, because Apache::DBI's connection caching never checks to see if the process and/or thread id changed, it will give you an invalid $dbh from a previous process/thread if you use a connection, then thread/fork, then try to re-connect to the same source, which is a Bad Thing.

DBIx::Class already manages its connections for you, and therefore it cannot benefit from Apache::DBI under any scenario. It makes one connection per-process, and keeps that connection persistent, reconnecting only if the connection appears to have died, or if you fork/thread over to another process/thread-id. The only Apache::DBI issue in DBIx::Class is that Apache::DBI will actually thwart DBIx::Class's connection management code, and cause it to use the same (and invalid) connection in a new process, in cases such as (as mentioned above) if you make a DBI connection before forking in a prefork mod_perl server.

Resources

The current connectivity code is part of the API. The API, in turn, is inside of the lib directory of the WormBase mercurial repository:

http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Service/

Currently, each database is a "Service". When the API is instantiated, connections are made to each required service. Database handles are cached. When a database handle is needed, the service first checks to see if it is live, if not it refreshes the connection. This would probably be the ideal location to implement failover -- check the current connection, if dead, try a different database server.

Note that attribute accessors also count as methods for the purposes of satisfying the requirements of a role.

http://bitbucket.org/tharris/wormbase/src/9009075c4c5a/lib/WormBase/API/Role/Service/gff.pm


  around 'dbh' => sub {
    my $orig = shift;
    my $self = shift;
    
    my $species = $self->species;
    
    # Do we already have a dbh? HOW TO TEST THIS WITH HASH REF?
    if ($self->has_dbh) {
	$self->log->debug("     gff-dbh for $species exists and is alive!");
	return $self->$orig;
    } else {
	$self->log->debug("     gff-dbh for $species doesn't exist yet; trying to connect");
	my $dbh = $self->connect($species);
    }
 };

Note also that these Services use Moose "Roles" to define shared methods and variables like connect().

Here's the acedb service:

http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Service/acedb.pm

The acedb service "consumes" the generic Service role:

http://bitbucket.org/tharris/wormbase/src/9009075c4c5a/lib/WormBase/API/Role/Service.pm

This nomenclature and structure is a bit confusing and probably needs to be streamlined/refined!!

Difference between revisions of "Design Specs: Database Connectivity"

Latest revision as of 18:48, 11 August 2010

Contents

Objectives

Implementation

Testing

Question

Continued Work

Things to think about

persistent connections

Resources

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

@@ Line 4: / Line 4: @@
 databases (multiple instances of the same database in some cases).
 Should be fault tolerant if a server or database crashes.
+==Implementation==
+http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Role/Service.pm
+a Moose Role module which defines the common attributes and behavior for different databases.
+eg.
+'hosts' array reference attribute which has all the available hosts from configuration file.
+reconnect: abstract method for the database connection, fail-over is also implementation here:
+*use Moose 'around' method modifier to check on the db handle every time it is accessed
+   around 'dbh' sub {}
+Note that attribute accessors also count as methods for the purposes of satisfying the requirements of a role.
+*do "reconnect" if dbh is not set, undef or time-out(AceDB)
+* will try to connect for maximum $ReconnectMaxTries(now this is hard coded as 5, will replace by conf) times
+* host is picked randomly among the pool
+   my $host = $self->hosts->[ rand @{$self->hosts} ];
+This Role is consumed by each Database Classes(Service module)
+*http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Service/acedb.pm
+*http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API/Service/gff.pm
+each has its own "connect" method and "ping" method(mostly just a wrapper)
+<pre>
+AceDB
+use Ace;
+sub connect {
+    my $self = shift;
+    my ($host,$port,$user,$pass)=@_;
+    return Ace->connect(-host => $host,
+			      -port => $self->port,
+			      -user=>$self->user,
+			      -pass=>$self->pass,
+			    );
+    #			   @cache);
+}
+GFF database
+use Bio::DB::GFF;
+sub connect {
+    my $self = shift;
+    my ($host,$port,$user,$pass)=@_;
+    return Bio::DB::GFF->new( -user => $user,
+			      -pass => $pass,
+			      -dsn => "dbi:mysql:database=".$self->species.";host=" . $host,
+    );
+}
+</pre>
+Minor changes in main API http://bitbucket.org/tharris/wormbase/src/tip/lib/WormBase/API.pm
+* took out the BUILD subroutine
+* modified gff_dsn() method
+==Testing==
+ prove -lv t/WormBase/API/Object/Gene.t  http://bitbucket.org/tharris/wormbase/src/tip/t/WormBase/API/Object/Gene.t
+<pre>
+ok 1 - use WormBase::API;
+ok 2 - Constructed WormBase::API object ok
+[DEBUG] acedb dbh for specie  doesn't exist or is not alive; trying to connect
+[INFO] trytime 1: Connecting to  acedb
+[DEBUG]      using the following parameters:
+[DEBUG]        aceserver.cshl.edu:2005
+[INFO]    --> succesfully established connection to  acedb on aceserver.cshl.edu
+[DEBUG] acedb dbh for specie  exists and is alive!
+ok 3 - The object isa WormBase::API::Object::Gene
+...
+...
+[DEBUG] gff dbh for specie c_elegans doesn't exist or is not alive; trying to connect
+[INFO] trytime 1: Connecting to  gff
+[DEBUG]      using the following parameters:
+Use of uninitialized value in concatenation (.) or string at /home/xshi/work/project/wormbase/website-2.0/lib/WormBase/API/Role/Service.pm line 106.
+[DEBUG]        aceserver.cshl.edu:
+[INFO]    --> succesfully established connection to  gff on aceserver.cshl.edu
+[DEBUG] gff dbh for specie c_elegans exists and is alive!
+[DEBUG] gff dbh for specie c_elegans exists and is alive!
+...
+</pre>
+==Question==
+When Ace time out would the dbh becomes undef?
+How useful is the Ace reopen method? For now I just re-connect.
+==Continued Work==
+setting up attributes such as hosts, user, password, caching for AceDB etc through Configuration file.
+  Catalyst::Plugin::ConfigLoader
+use Ace (); to prevent eg. the connect sub from Ace being imported into our namespaces.
+==Things to think about==
+===persistent connections===
+Apache::DBI        DBI  connect_cached     DBIx::Class
+Also, because Apache::DBI's connection caching never checks to see if
+the process and/or thread id changed, it will give you an invalid $dbh
+from a previous process/thread if you use a connection, then
+thread/fork, then try to re-connect to the same source, which is a Bad
+Thing.
+DBIx::Class already manages its connections for you, and therefore it
+cannot benefit from Apache::DBI under any scenario.  It makes one
+connection per-process, and keeps that connection persistent,
+reconnecting only if the connection appears to have died, or if you
+fork/thread over to another process/thread-id.  The only Apache::DBI
+issue in DBIx::Class is that Apache::DBI will actually thwart
+DBIx::Class's connection management code, and cause it to use the same
+(and invalid) connection in a new process, in cases such as (as
+mentioned above) if you make a DBI connection before forking in a
+prefork mod_perl server.
 ==Resources==
@@ Line 12: / Line 131: @@
 Currently, each database is a "Service".  When the API is instantiated, connections are made to each required service.  Database handles are cached.  When a database handle is needed, the service first checks to see if it is live, if not it refreshes the connection.  This would probably be the ideal location to implement failover -- check the current connection, if dead, try a different database server.
+*Note that attribute accessors also count as methods for the purposes of satisfying the requirements of a role.
+http://bitbucket.org/tharris/wormbase/src/9009075c4c5a/lib/WormBase/API/Role/Service/gff.pm
+<pre>
+  around 'dbh' => sub {
+    my $orig = shift;
+    my $self = shift;
+    my $species = $self->species;
+    # Do we already have a dbh? HOW TO TEST THIS WITH HASH REF?
+    if ($self->has_dbh) {
+	$self->log->debug("     gff-dbh for $species exists and is alive!");
+	return $self->$orig;
+    } else {
+	$self->log->debug("     gff-dbh for $species doesn't exist yet; trying to connect");
+	my $dbh = $self->connect($species);
+    }
+ };
+</pre>
 Note also that these Services use Moose "Roles" to define shared methods and variables like connect().
@@ Line 24: / Line 166: @@
 This nomenclature and structure is a bit confusing and probably needs to be streamlined/refined!!
+[[Category:Developer documentation]]