January 24, 2009 2 Comments
Interesting list of distributed key-value stores,
Perhaps you’re considering using a dedicated key-value or document store instead of a traditional relational database. Reasons for this might include:
- You’re suffering from Cloud-computing Mania.
- You need an excuse to ‘get your Erlang on’
- You heard CouchDB was cool.
- You hate MySQL, and although PostgreSQL is much better, it still doesn’t have decent replication. There’s no chance you’re buying Oracle licenses.
- Your data is stored and retrieved mainly by primary key, without complex joins.
- You have a non-trivial amount of data, and the thought of managing lots of RDBMS shards and replication failure scenarios gives you the fear.
Whatever your reasons, there are a lot of options to chose from. At Last.fm we do a lot of batch computation in Hadoop, then dump it out to other machines where it’s indexed and served up over HTTP and Thrift as an internal service (stuff like ‘most popular songs in London, UK this week’ etc). Presently we’re using a home-grown index format which points into large files containing lots of data spanning many keys, similar to the Haystack approach mentioned in this article about Facebook photo storage. It works, but rather than build our own replication and partitioning system on top of this, we are looking to potentially replace it with a distributed, resilient key-value store for reasons 4, 5 and 6 above.
At Feedster we used MySQL initially as an RDBMS and quickly switched to using it pretty much as a key-value store. This is a little simplistic in fact, we did run selects to gather multiple tuples, but we eschewed anything more complex than a simple select … from table where … (basically single table selects.)
I am not ready to jump ship to an Anti-RDBMS just yet, I still need to be able to get multiple tuples containing a specific indexed value, so select … from table where … is still something that is required.