Read replication with MySQL

I have been following the thread about the death of read replication over on the Planet MySQL weblog with interest. In with this issue the notion of caching is thrown in to illustrate that it can be used as a substitute to read replication. (See this, this and this.)

Personally I think the two issues are separate and should be treated as such, and I will be basing this on my experiences at Feedster scaling a MySQL database from about 1GB to around 1.5TB.

Initially we relied on read-replication to shift the read load from the master server to alternative read servers. For a while this worked, but as our hardware got better (post-funding) we found that the read servers were not keeping up with replication. After some amount of digging and consultation, what became very clear to me was that the read servers were never going to catch up for a very simple reason.

While the master server and the read servers were roughly the same in terms of capacity, the issues were that the read server was having to support the same write load as the master server and, in addition, a much higher read load. Combine that with the fact that replication takes places in a single thread (whereas the master uses multiple threads to write data), and you have a situation where the read servers cannot catch up with the master server.

There are a couple of tricks you can employ to make the slave servers faster, one is to do the replication across multiple threads using a script (which I have done) but you lose referential integrity, the other is to write a utility which pre-reads the replication log and accesses relevant rows before they are accessed to make sure that replication is not slowed down waiting for data to be read off storage (this was the solution implemented by YouTube for a while).

Looping back to read replication. I agree that read replication is dead, and it should be. Replication should be used for backup purposes only, which is what we eventually did at Feedster. And your replication server should be ready to take over if the master server fails.

Onto the second issue of caching. The caching that memcached does is actually pretty simplistic. You can cache a ‘chunk of data’ somewhere and access it later if it has not been flushed to make room for other ‘chunks of data’. I say ‘chunk of data’ because that is how memcached sees it, you are responsible for serializing the data (flattening it in a contiguous area of memory) and decoding it when you get it back. Caching makes sense if it takes you more time to get data out of your database than it does getting it from cache. Ideally you want to be in a situation where you don’t need to use caching because you get get to your data fast enough. Getting to that point means having an optimized schema and a sharded database so you can take advantage of the bandwidth that multiple machine afford you. The point is to take the memory you would use for caching and give it to your database servers.