A friend recently referred me to the Facebook engineering blog.
The most recent post, “Scaling Out“, describes how they added extra data to their mysql replication stream to keep a distributed cache in sync. It is a good approach but instead of adding stuff to the sql statement itself, I would have added a new message type to the replication stream (the binary log in fact.) I think there are about 10 different message types defined right now and either 1 or 2 in use, so this would not have been too difficult to do and would have prevented ‘pollution’ of the sql statement.
The post on “Facebook Chat” mirrors the “Lessons In Building Scalable Systems” presentation that Google did at the Google Scalability Conference in 2007.
They also describe Thrift, which is now in the Apache Incubator:
Thrift is cross-language serialization and RPC framework. It combines a powerful software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, C#, Erlang, Perl, and several other languages. Thrift was originally developed at Facebook and open-sourced in April, 2007. Thrift entered the Apache Incubator in May, 2008.
Reading the description, I was struck by the parallels between that and the Google Protocol Buffers. If you are going to develop applications which communicate over a network, you will need to look at both.