To Shard or not to Shard
January 7, 2009 Leave a comment
This post on (not) sharding over at 37 Signals is getting some airtime.
So rather than sharding, they figured that they would just throw more RAM at the database server and scale that way:
Our read performance is in some aspect being taken care of by the fact that you can get machines with 256GB RAM now. We upgraded the Basecamp database server from 32GB to 128GB RAM a while back and we thought that would be the end of it.
The box was maxed out and going beyond 128GB at the time was stupid expensive. But now there’s 256GB to be had at a reasonable price and I’m starting to think that by the time we reach that, there’ll be reasonably priced 512GB machines.
I have to admit that I did a bit of a double take on this, 128GB of RAM is a lot of memory. I can see why they would want to avoid sharding. Sharding introduces all sort of complications, for example getting all rows matching a criteria gets more complicated (you have to address all the shards), doing joins gets more complicated, referential integrity gets more complicated, etc… On the other hand by leaving all the data on a single machine, even if that machine has a lot of RAM, does not remove other bottlenecks such as memory bandwidth, CPU power (to an extent), so there are still going to be limits to how far you can go.
Admittedly sharding is hard and has limited tool support, and those tools which support it are primitive at best, but I think there is much more upside to sharding than downside.
That being said it is a choice that should be based on the requirements at hand. Some of the databases in the current system I am building are sharded and others are not.