Well, I've come a little late to this blogging thing, partly because I haven't had a huge amount to say which I think would be of much use to others. I still may not, but let's see.
This post, apart from starting with a short introduction, is about PHPDance, a PHP interface to the Sharedance cache server which I wrote a couple of months ago. Sharedance is a distributed object cache much like Danga's Memcached, except that while Memcached only saves data to memory, Sharedance also writes it to the hard disk.
Just what is a distributed object cache?
Well, an object cache is a tool which allows you to store arbitrary data, referenced by a key and then retrieve it by that key at a later date. The distributed bit refers to how and where that data is stored. In a local object cache such as APC the data is cached directly on the local machine, whereas a distributed cache spreads it's data across multiple machines. There are two main reasons for wanting to do this.
- Performance and making efficient use of the available resources. Imagine you have four webservers, each with 4Gb of memory (2Gb of which is reserved for the cache), and that you are using an object cache. Unless you have divided your application across the four servers (ie users.example.com, shop.example.com, review.example.com, payment.example.com) which is usually not possible, all four of your servers will be serving up roughly the same data. The effect on the cache is that pretty much the same data is being cached across each of the four servers giving you effectively 2Gb of memory for your application. However, if you could spread the cache across the four servers you could increase the amount of available cache space by four times making 8Gb of cache available. This side of things is covered very well by Memcached, with it's incredible speed.
- Avoiding the database all together, which requires resilience. Most of the time you'll be caching data which can be found in another source such as the database. However, there are situations where you don't want to have to store data in the database. When there are likely to be a large number of writes involved and there won't be much benefit gained. Session data is a very good example of this. It would be dangerous to store session data in memcached only as if the cache gets full old session data will fall off the cache. In this situation something like Sharedance would be more appropriate as data is also saved to the hard disk so old session data is not lost (unless it is given an expiry or is explicitly deleted).
PHPDance provides a clean, object oriented, PHP5 interface to Sharedance. Let's take a look at an example for a setup similar to that mentioned earlier, four webservers each running an instance of the cache server (in this case Sharedance).
$cache = new Sharedance();
$cache->addServer( new SharedanceServer( 'web1.example.com' ) );
$cache->addServer( new SharedanceServer( 'web2.example.com' ) );
$cache->addServer( new SharedanceServer( 'web3.example.com' ) );
$cache->addServer( new SharedanceServer( 'web4.example.com' ) );
$key = 'mykey';
$data_in = 'some data which needs to be cached';
$cache->set( $key, $data_in );
$data_out = $cache->get( $key );
Here we create a new instance of the Sharedance object which acts as our gateway to the cache as a whole and then add instances of SharedanceServer for each of the machines which we want involved in this cache. Note, this must be exactly the same everywhere the cache is being used. Then we set some data to the cache and then get it back again with the set and get methods. What happens under the hood is the Sharedance object determines which server the data should be cached on and then caches it there (this is why even the order of the servers must be the same everywhere this is used).
Building In Redundancy
This is all well and good, and the fact that the cache is written to the disk as well is great because it means we don't have to worry about overflowing the cache however it's still not very resilient. What happens when one of the machines goes down? We effectively lose all the cached data on it for the period that it's down, this is OK when the data is actually stored in the database but when the cache is the only place it's stored it's a bit more difficult. PHPDance addresses this issue with redundant writes. If you ask it to be redundant it will write to two servers every time you set and then, if the first one is down when you get it will try the second. It's painfully easy to switch this on, just set the first and only parameter to the Sharedance constructor to true.
$cache = new Sharedance( true );
You should note that redundancy can only work if you have enough servers. If, for example you only have one server, you obviously cannot enable redundancy. Also, if you have more than one server but one has a weighting greater than the total number of servers it will not work either.
Extend And Improve
Earlier I mentioned that a typical example of when something like this is useful is session management. PHP provides a function for setting custom session handlers (session_set_save_handler()) which has been implemented in SharedanceSession to create a distributed, redundant, Sharedance backed session handler for PHP5.
Check it out at http://sourceforge.net/projects/phpdance/