RSS implementation

From WormBaseWiki
Jump to navigationJump to search

OBJECTIVE

Create a simple RSS based notification system to easily track changes to any given object in the database.

Approach

Since our databases are build-use-discard, the RSS system requires a database to track changes over time. Here's how it works.

1. The first time a feed is requested, the appropriate object is fetched with timestamps on.

2. This object is stored in a database with it's modification time set arbitrarily to 2007-01-01.

3. The Tree of the object is parsed.

4. Entries in the object that are *newer* than the arbitrary default modification date are saved, parsed, and linked as appropriate.

5. Entries are concatenated and added to a second "history" table. The data schema formally has a simple one object-many history structure. Coupled with the parsed notes this lets us track many changes for a given object over time -- and even to pinpoint what database build these changes were made.

6. Updated entries are found and stored in the history table, one row for each time anything is updated. Determining new entries and storing them in the database only needs to be done once. For subsequent requests, updated entries are culled directly from the database.

Ramifications

1. Feeds are created dynamically as needed.

2. Database grows organically over time in response to user requests for feeds. No need to precalculate or maintain all RSS feeds as separate files.

3. The RSS feed corresponds to the predominant object displayed on a given page only and not its full contents.

Determining if an object has changed

I considered three options for determining if an object had changed from one release to another.

1. Parsed Ace::Object tree

The method I selected is the slowest but most effective approach. With this approach, it is easily possible to save the entire path to a modified item. This allows for a more meaningful RSS feed with hyperlinked objects.

Alternative approaches:

2. md5_sum of XML dump

For a given class, visit each object. Dump out the data as XML and create an md5sum. Store the object ID, class, date, and the md5sum in a separate database.

With each new release, repeat the procedure. If the md5sum is DIFFERENT, update the record in the database and create a new static RSS feed entry for that object. Otherwise ignore it. I also considered using the HTML content of the page itself but reasoned that small changes in the UI would trigger new entries in the RSS feeds.

Using this approach, we have no way of indicating how the object has changed, only that it has.

3. Parsed XML dump

Parse the XML dump and look for time stamps newer than the last modification date of the object. This approach is rather tedious in terms of displaying useful information as one must climb up from the timestamp in a complicated (Perl) data structure.


Schema

This is a quick hack. I have no idea how well this will scale.


DROP TABLE IF EXISTS `objects`;
CREATE TABLE `objects` (
  `oid` int(11) NOT NULL auto_increment,
  `name` char(35) NOT NULL default '',
  `class` char(35) default '',
  `signature` char(32) NOT NULL default '',
  `last_modification_date` date default '2007-01-01',
  `ace_version` char(5) NOT NULL,
  PRIMARY KEY  (`oid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

DROP TABLE IF EXISTS `history`;
CREATE TABLE `history` (
  `sid` int(11) NOT NULL auto_increment,
  `oid` int(11) NOT NULL,
  `version` char(5) default '',
  `date` date,
  `signature` char(32) NOT NULL default '',
  `notes` text default '',
  PRIMARY KEY  (`sid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;