Building Scalable Web Sites

I am wrapping up reading “Building Scalable Web Sites: Sites Building, scaling, and optimizing the next generation of web applications” by Cal Henderson and I strongly recommend it for anyone who wants to build a scalable web site. The books goes over a lot of the gotchas that typically bite people when their web site is discovered and there is suddenly a lot of traffic.

Coverage of the data side of things is a little sparse, if you want to find out how to really scale MySQL, you should check these videos:


Why iPlayer is flawed

I have been reading about iPlayer, mostly criticism. It seems to be a:

Windows XP-only, Windows Media Player-only, Internet Explorer-only, DRM-constrained iPlayer application.

according to Mashable.

And this is its greatest flaw. By taking this approach, the Beeb has pretty much guaranteed that it will be accessible by a small population, thereby limiting its success by default.

It is a shame because I really enjoy the BBC content I have managed to get here in the US, but the Beeb seems to be determined not to makes it content available to those who want it.

I use a Mac and I don’t think of myself as a minor platform, neither do I think of Linux that way. In fact in recent episode of, Patrick Norton was surprised that 20% of the downloads were done from Macs. I was surprised too, I expect the number to be less than that.

Another figure that surprised me was Steve Jobs saying that iTunes had been installed on 300 million computers. That is a huge number and if I were going to distribute audio and/or video content, I would certainly look for ways to leverage that reach.

New blog

I have just created a new blog called Boston Startups in which I will write about the Boston Startup scene as I run into it, which seems to be fairly often these days.


I have written about OpenSearch before, and came across this article on it on

The articles doesn’t break any new ground but is just a quick overview of the protocol.

iPhone specific sites

Google has created an iPhone specific search page, I came across this via TUAW.

I tried the search page and it works pretty well for a demo.

What bothered me was this comment in the TUAW post:

But the problem with this goes right back to what Scott was talking about the other day— we aren’t supposed to be getting half the web on the iPhone, we’re supposed to be getting the real web. In this case, there’s not much to complain about– this really is Google, minus the extra content and the ads. However, the links actually go to regular browser windows (not iPhone formatted sites), and if you hit “More Results” at the bottom of the page, it takes you to a normal, full-screen Google page anyway. So what’s the point? Yes, this is just a demo, but why bother making an iPhone specific page in the first place? iPhone users should be able to browse to the Google homepage like everyone else.

While Apple may have meant Safari on the iPhone to allow users to browse the normal web, the reality is that the normal web is designed for displays a lot larger than that on the iPhone, even though it is an exceptional display.

Frankly, when I bring up a complex web page, such as the NY Times front page, on my iPhone, it is nice that I can see the whole page but the text looks like fly poop to me (and I have good vision) and I need to zoom to read anything.

I think it makes much more sense to have web pages specifically designed for mobile devices, so they can be optimized for smaller screens and lower bandwidth (EDGE, ahem).

Functionally I think it is much nicer for the user to get a page which they can read without having to zoom right off the bat and just scroll to read more, than getting a page which needs to be zoomed to read anything at all.

Web Innovators Group meeting

A quick post to say that registration for the 14th Web Innovators Group meeting is open. The meeting will be held on September 10th at the Royal Sonesta in Cambridge.

Privacy, are we having the right debate?

It seems like all the major search engines are falling over themselves announcing new privacy initiatives. All this is very laudable, I think it is important to have clearly defined privacy policies but I am wondering if we are actually having the right debate?

I think there are four key questions we need to look at:

  • The first is what data being stored. Currently consumers generate a lot of data as they browse the web, search histories, pages viewed, email, documents, etc… A lot of that data can be aggregated too, providing a wealth of data. I think we understand that a search engine collects that data, but I am most interested in the intersection Google and DoubleClick data.
  • The second is what that data is being used for. This flows out naturaly from the first question. Looking at search histories, pages viewed, a search engine will be able to detect trends and recommend pages we might not have otherwise found, eventually personalizing the search results. Better ad targetting is a no-brainer too. I am also very interested to know what cross-purposes the data is being put to, for example my search history being used to provide additional signals for ad targetting when I am reading my email online.
  • The third is what the data retention policy is. This is where all the action seems to be these days, how long is the data stored for, how long cookies remain active for, when is data anonymised and how. Shortening cookie expirations is privacy theater. And it has also been shown that anonymised logs are far from anonymous. Also there may be legal requirements to store data for certain lengths of time.
  • The fourth is under what circumstances data is disclosed to law enforcement agencies. This does not seem to have been all that well addressed. For example when the FBI asked the major search engines for data, all but Google rolled over and gave up the data requested. What was interesting about this is that the FBI did not press their case with Google which suggests they were on shaky legal grounds in the first place, yet everyone except Google complied.

I think it is a given that data about our browsing habits will be stored and used. This is the principal manner in which service providers learn about us have the means to provide a better browsing experience (personalization is a big factor here.)

What is important for us consumers to understand is how this data is used, aggregated, disseminated, retained and purged. At which point it will be easier to determine whether the loss of privacy is worth it.

And so far I have yet to see comprehensive information from any service providers about that.