MySQL Proxy for sharding

I have been reading about various experiments using MySQL Proxy to handle sharding (and by extension scaling) for application by rewriting SQL queries as they come through and directing them to the appropriate shards.

The most visible project seems to be HScale, which is well worth looking at and reading about.

The premise is very compelling, which is to remove the issue of sharding from the application layer, moving it into the database layer. This makes the application less complex because it no longer needs to deal with sharding (though it could be argued that sharding, if correctly done, has very little ‘imprint’ on the application.)

I think this project has promise but there are some questions that needs to be addressed before it is really ready to be used in a production setting:

  • First is that the MySQL Proxy introduces a single point of failure. If it fails, the application stops. At the very least, there needs to be a number of proxies and the application needs to be able to detect when one has failed and switch over to another one. I suspect you could get around that issue with a load balancer.
  • Second sharding does not mean that your application automatically becomes fault tolerant. If you have more machines, the odds of one failing go up, so the proxy needs to be able to handle failing over from a failing server to a backup server.

Both of those are difficult problems to deal with, and like a lot of software projects it is the 20% that is going to take 80% of the time.


3 Responses to MySQL Proxy for sharding

  1. karel1980 says:

    MySQLProxy does not necessarily introduce a single point of failure.
    Simply run multiple mysqlproxy instances and configure your application to fall back to the other instance when one is going down.

    About fault tolerancy, MySQLProxy can do R/W splitting (script included) which goes a long way OR be used to create a multimaster setup.

    Using it for sharding seems a bit disappointing at the moment: at the moment, HScale (a sharding-with-MySQLProxy implementation) does not support cross-database sharding, only cross-table sharding (I assume one could use FEDERATED tables, but it’s unclear to me what the pros/cons would be in that case).

  2. You are very right about running multiple MySQLProxies to mitigate the single-point-of-failure issue, my mistake for not seeing that especially since I developed exactly the same solution for the search engine at Feedster.

    I (with others) tested a product for MySQL which would create a multi-master setup (by which I mean that two independent servers maintain the same data). It would read from the least loaded servers and write to both maintaing a log in case of the servers went down, which it would replay against that server when it came back up before bringing it online. We did not use this product because it did not work as advertised and it was complicated to install. But I think multi-master with recovery could be very powerful for installations who wanted large MySQL servers.

    Cross-table sharding (splitting up a table across MySQL instances) is already very good and I think would cover 95% of the user cases. But I am not sure I understand cross-database sharding.

  3. Pingback: HScale and MySQL Proxy « François Schiettecatte’s Blog

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: