François Schiettecatte’s Blog

May 17, 2008

TCP tuning and others

Filed under: Scaling — François Schiettecatte @ 7:54 am

This article has some interesting suggestions on tuning TCP on linux for machines which have to support a lot of network IO.

Also mentioned are some other suggestions for tuning the linux machines in general, setting such things as vm.swappiness and ‘noatime’ on file systems.

Talking about ‘noatime’, I did a lot of testing on that and found that it really reduced disk I/O on drives where lots of files are accessed frequently such as cache files. My policy on this is to set ‘noatime’ on all partitions unless you need to know when files were last accessed.

May 15, 2008

To use or not to use Memcached, that is the question

Filed under: Scaling — François Schiettecatte @ 11:49 am

Parvesh Garg asks the question whether to use memcached or not. Actually he asks a number of good questions as follows:

1. What is the total size of your data? It might be a possibility that you can keep the data in memory in each node, or MySQL can just keep the whole thing (data+indexes) in a buffer.

2. How frequently your data is updated? Very frequent updates may lead to low cache hit ratio for memcached data. And refreshing memcached too many times may lead to unnecessary overhead. Remember doing [get,get,set] vs [get].

3. What is the peak load on your system? Consider if MySQL itself can handle the peak load or otherwise if even memcached cannot handle the peak load with given infrastructure.

I have written about this before, here and here, as has Greg Linden.

At the risk of sounding like a broken record, the approach I take is to default to MySQL and if needed use Memcached to boost performances. Initially at Feedster we used Memcached to boost performance, but when we got a powerful enough server for MySQL we did away with Memcached because it was not longer needed and, indeed, was more of a hassle than a benefit because we needed to administer additional machines.

Besides which, I think it makes a LOT more sense to invest in partitioning your data and spreading it across a number of MySQL servers (on servers which would otherwise be used for Memcached).

A wise man once told me that a well designed database server should be CPU bound and not I/O bound, meaning that all the indices and data you use reside in memory and not on disk. To me that sounds just like what Memcached does.

May 10, 2008

Scaling MySQL at Facebook

Filed under: Feedster, Scaling, Search — François Schiettecatte @ 10:16 am

By way of Greg Linden, some interesting notes and figures from various high traffic web sites on scaling MySQL.

As Greg points out, Facebook’s strategy is to partition the data and spread it across a lot of servers, which is pretty much the only way to go if you want to scale MySQL, or any site for that matter.

Crawling is indeed harder than it looks

Filed under: Feedster, Scaling, Search — François Schiettecatte @ 10:10 am

Greg Linden (a must-read blog because he picks up new publications very quickly) has a good post aggregating a number of papers from WWW 2008 on crawling and why crawling is hard.

I wrote the version one crawler for Feedster (version zero was not very good and got ditched very quickly) and it is very difficult to write a good crawler. It is basically a balancing act, currency versus bandwidth usage, etc…

I finished writing a crawler a month or so ago for the current project I am working on and it took me a while to adjust the crawl interval based on how frequently a feed changed. I am not sure I have it quite right yet and the algorithm still needs more adjustment.

May 9, 2008

MySQL Forge

Filed under: Scaling, Software Development — François Schiettecatte @ 12:29 pm

I came across MySQL Forge a few days ago when Brian Moon was asking about sample my.cnf configuration files for MySQL. The samples provided in the distribution are very old, very, very old, and more up-to-dates ones are needed.

So I uploaded a copy of the my.cnf configuration file which I have used for database servers sporting 16GB of RAM. You can check out all the same my.cnf which have been uploaded here.

I have to admit mine is sparsely commented, I usually just point to the relevant place in the MySQL documentation, but it does the job.

Boston scalability user group meeting

Filed under: Scaling, Software Development — François Schiettecatte @ 12:16 pm

There is another Boston scalability user group meeting coming up on May 28th. I was not able to attend the last two but I think I will go to this one because it looks interesting:

Orion Letizi from Terracotta will conduct an interactive session exploring the open-source Terracotta project. Terracotta is useful in many different cases including use as Network Attached Memory for working with large datasets, clustering HTTP sessions, reducing database load, acting as a second-level cache for Hibernate objects and more. You can find out more about some of these use cases by visiting the Start Learning Terracotta page.

April 18, 2008

MySQL Proxy for sharding

Filed under: Scaling, Software Development — François Schiettecatte @ 7:41 am

I have been reading about various experiments using MySQL Proxy to handle sharding (and by extension scaling) for application by rewriting SQL queries as they come through and directing them to the appropriate shards.

The most visible project seems to be HScale, which is well worth looking at and reading about.

The premise is very compelling, which is to remove the issue of sharding from the application layer, moving it into the database layer. This makes the application less complex because it no longer needs to deal with sharding (though it could be argued that sharding, if correctly done, has very little ‘imprint’ on the application.)

I think this project has promise but there are some questions that needs to be addressed before it is really ready to be used in a production setting:

  • First is that the MySQL Proxy introduces a single point of failure. If it fails, the application stops. At the very least, there needs to be a number of proxies and the application needs to be able to detect when one has failed and switch over to another one. I suspect you could get around that issue with a load balancer.
  • Second sharding does not mean that your application automatically becomes fault tolerant. If you have more machines, the odds of one failing go up, so the proxy needs to be able to handle failing over from a failing server to a backup server.

Both of those are difficult problems to deal with, and like a lot of software projects it is the 20% that is going to take 80% of the time.

April 10, 2008

Read replication with MySQL - part deux

Filed under: Feedster, Scaling, Search, Software Development — François Schiettecatte @ 1:32 pm

Following up on my last post on read replication with MySQL, I read this post by Greg Linden on the subject of caching which mirrors my thinking on the matter (except that his is better written):

My opinion on this differs somewhat. I agree that read-only replication is at best a temporary scaling solution, but I disagree that object caches are the solution.

I think caching is way overdone, to the point that, in some designs, the caching layers sometimes contains more machines than the database layer. Caching layers add complexity to the design, latency on a cache miss, and inefficiency to use of cluster resources.

My experience at Feedster confirms this, once we got powerful enough servers for the DBMS, we found that we did not need to use memcached at all, in fact it was a hinderance more than anything because it added to the number of machines that needed to be administered.

As a small postscriptum, this post by Ronald Bradford does a very good job of listing out the reasons for replication along with the advantages and disadvantages of each.

April 9, 2008

Being lazy sometimes pays off

Filed under: Scaling, Software Development — François Schiettecatte @ 5:04 pm

Interesting post on Gojko.net on how to make web sites go faster.

The crux of the article is summed up in these three points:

  • Delegate all long operations to a background process
  • Never ever talk to an external system synchronously, no matter how fast it is
  • Be lazy – if something does not have to be processed now, leave it for later

The one that resonated with me is the third one. The last search engine I built depended on two external libraries to support very specific foreign languages. One of these libraries took a while to initialize so I would only do it only when I knew it was going to be needed rather than doing the initialization with every new search that came in.

April 8, 2008

Read replication with MySQL

Filed under: Feedster, Scaling, Software Development — François Schiettecatte @ 4:31 pm

I have been following the thread about the death of read replication over on the Planet MySQL weblog with interest. In with this issue the notion of caching is thrown in to illustrate that it can be used as a substitute to read replication. (See this, this and this.)

Personally I think the two issues are separate and should be treated as such, and I will be basing this on my experiences at Feedster scaling a MySQL database from about 1GB to around 1.5TB.

Initially we relied on read-replication to shift the read load from the master server to alternative read servers. For a while this worked, but as our hardware got better (post-funding) we found that the read servers were not keeping up with replication. After some amount of digging and consultation, what became very clear to me was that the read servers were never going to catch up for a very simple reason.

While the master server and the read servers were roughly the same in terms of capacity, the issues were that the read server was having to support the same write load as the master server and, in addition, a much higher read load. Combine that with the fact that replication takes places in a single thread (whereas the master uses multiple threads to write data), and you have a situation where the read servers cannot catch up with the master server.

There are a couple of tricks you can employ to make the slave servers faster, one is to do the replication across multiple threads using a script (which I have done) but you lose referential integrity, the other is to write a utility which pre-reads the replication log and accesses relevant rows before they are accessed to make sure that replication is not slowed down waiting for data to be read off storage (this was the solution implemented by YouTube for a while).

Looping back to read replication. I agree that read replication is dead, and it should be. Replication should be used for backup purposes only, which is what we eventually did at Feedster. And your replication server should be ready to take over if the master server fails.

Onto the second issue of caching. The caching that memcached does is actually pretty simplistic. You can cache a ‘chunk of data’ somewhere and access it later if it has not been flushed to make room for other ‘chunks of data’. I say ‘chunk of data’ because that is how memcached sees it, you are responsible for serializing the data (flattening it in a contiguous area of memory) and decoding it when you get it back. Caching makes sense if it takes you more time to get data out of your database than it does getting it from cache. Ideally you want to be in a situation where you don’t need to use caching because you get get to your data fast enough. Getting to that point means having an optimized schema and a sharded database so you can take advantage of the bandwidth that multiple machine afford you. The point is to take the memory you would use for caching and give it to your database servers.

Older Posts »

Blog at WordPress.com.