I have been reading about various experiments using MySQL Proxy to handle sharding (and by extension scaling) for application by rewriting SQL queries as they come through and directing them to the appropriate shards.
The most visible project seems to be HScale, which is well worth looking at and reading about.
The premise is very compelling, which is to remove the issue of sharding from the application layer, moving it into the database layer. This makes the application less complex because it no longer needs to deal with sharding (though it could be argued that sharding, if correctly done, has very little ‘imprint’ on the application.)
I think this project has promise but there are some questions that needs to be addressed before it is really ready to be used in a production setting:
- First is that the MySQL Proxy introduces a single point of failure. If it fails, the application stops. At the very least, there needs to be a number of proxies and the application needs to be able to detect when one has failed and switch over to another one. I suspect you could get around that issue with a load balancer.
- Second sharding does not mean that your application automatically becomes fault tolerant. If you have more machines, the odds of one failing go up, so the proxy needs to be able to handle failing over from a failing server to a backup server.
Both of those are difficult problems to deal with, and like a lot of software projects it is the 20% that is going to take 80% of the time.






MySQLProxy does not necessarily introduce a single point of failure.
Simply run multiple mysqlproxy instances and configure your application to fall back to the other instance when one is going down.
About fault tolerancy, MySQLProxy can do R/W splitting (script included) which goes a long way OR be used to create a multimaster setup.
Using it for sharding seems a bit disappointing at the moment: at the moment, HScale (a sharding-with-MySQLProxy implementation) does not support cross-database sharding, only cross-table sharding (I assume one could use FEDERATED tables, but it’s unclear to me what the pros/cons would be in that case).
Comment by karel1980 — July 22, 2008 @ 2:11 am
You are very right about running multiple MySQLProxies to mitigate the single-point-of-failure issue, my mistake for not seeing that especially since I developed exactly the same solution for the search engine at Feedster.
I (with others) tested a product for MySQL which would create a multi-master setup (by which I mean that two independent servers maintain the same data). It would read from the least loaded servers and write to both maintaing a log in case of the servers went down, which it would replay against that server when it came back up before bringing it online. We did not use this product because it did not work as advertised and it was complicated to install. But I think multi-master with recovery could be very powerful for installations who wanted large MySQL servers.
Cross-table sharding (splitting up a table across MySQL instances) is already very good and I think would cover 95% of the user cases. But I am not sure I understand cross-database sharding.
Comment by François Schiettecatte — July 22, 2008 @ 4:43 pm
[...] Scaling, Software Development — François Schiettecatte @ 5:58 pm I have written about HScale before and MySQL Proxy, raising a few questions about fault tolerance, but I did not anticipate the issues they ran [...]
Pingback by HScale and MySQL Proxy « François Schiettecatte’s Blog — August 26, 2008 @ 5:58 pm