RSS coming to the iPhone

At this point nothing has been confirmed but it looks like RSS may be coming to the iPhone.

Frankly I would be surprised if it did not make its way to the iPhone. While RSS itself excites a small minority, what it enables in terms of information delivery is much more exciting to a much larger majority.

Putting a good RSS reader on the iPhone would allow users to have lots of good information delivered for reading and viewing (hopefully it will support enclosures.)

And the AJAX support on the browser would allow the creating of widgets (or gadgets, or whatever) which deliver information via RSS.

Feedster just launched such a widget maker.

I wish someone would find another name for that technology, RSS is very geeky IMHO, we need that name to receed into the background, users don’t think HTTP, HTML, AJAX, etc… they just want stuff to work for them.

New look at Feedster

Feedster has a new look and new features, channels and widgets, allowing you to build what they are calling feedwidgets for your blog and/or web site.

I personally know most of the people who worked on this new design and features, props to them for putting this together.

Scaling up vs. scaling out

Interesting post on scaling up vs. scaling out from Jay Pipes.

Worth reading if you plan to build the next killer site using MySQL.

There in an interesting comment in the post:

Anyone who wants to design their database architecture so that it’ll allow them to inexpensively grow from one box rank nothing to the top ten or hundred sites on the net should start out by designing it to handle slightly out of date data from replication slaves, know how to load balance to slaves for all read queries and if at all possible to design it so that chunks of data (batches of users, accounts, whatever) can go on different servers. You can do this from day one using virtualisation, proving the architecture when you’re small. It’s a LOT easier than doing it while load is doubling every few months!

I used to think that using replication slaves as read servers to take the load off the master server was a good idea. I am not so sure now given past experience. The issue is that replication slaves will fall behind if there is any kind of load on them, which will happen because they have to support both the same write load as the master and very likely a great read load too. Things are further stacked against you because slaves are usually underpowered compared to the master (who wants to use their best machines for slaves, right?) and the replication write thread runs as a single thread, whereas there are multiple write threads on the master.

I now think it is much better to use the replication slaves as hot backups for the master should the master fail (using similar machines for the master and the slaves), and shard your data to scale out.

Google performance tools – part quatre

I have written three posts (part 1, part 2 & part 3) about the Google performance tools, at which time I claimed that I did not see much improvement from using the alternative memory allocator.

It turns out that my test was flawed. The test I ran was to create an in-memory index, a process which requires the allocation of lots of little bits of memory. Unfortunately the process runs as a single thread which is not the use case the alternative memory allocator is supposed to address. It is supposed to address the case where there are lots of threads running, all allocation memory.


I set up and ran a different test. A search engine I have been playing around with here allows me to set it up either as a number of processes and/or a number of threads. So I took a chunk of data (2GB of text), created an index (2GB of index) and processed to run a large number of test, setting up the search engine with 20 instances with a single thread each, and one instance with 20 threads.

I am not going to publish the raw data here, it is not all that interesting, but I will describe what I noticed.

When running the search engine with 20 instances with a single thread each, the google alternative memory allocator actually delivers about 10% worse performance than the memory allocator in the glibc library with the search cache turned off, but about 10% better performance with the search cache turned on.

When running the search engine as one instance with 20 threads, the results are quite different. The google alternative memory allocator delivers about 10-15% better performance than the memory allocator in the glibc library with the search cache turned off, and 25-30% better performance with the search cache turned on.

What is also interesting is that the performance curve flattens out after peaking at 4 concurrent clients (the machine sports dual xeon processors with hyperthreading) when running 20 instances with a single thread each (peaking at 520 searches/sec, flattening out at 500 seaches/sec), while dropping and flattening out when running one instance with 20 threads (peaking out at 860 searches/second, flattening out at 570 searches/second).

So my results bear out that the google alternative memory allocator delivers when used in a threaded application.

“Human Touch”

The NY Times has an article wondering if the ‘Human Touch’ will loosen Google’s grip on the search industry.

First the obvious comments. The press thrives on controversy, it makes for interesting news which sells. There also seems to be this obsession with looking for the next Google, I see it in the press and the VCs, everyone want to find, and back, the next Google, thereby securing fame and/or fortune (most likely both.)

Now for the less obvious comments.

Matt Cutts (by way of John Battelle) verbalizes this one much more eloquently than I ever could, so go and read his article. The crux of his argument is that even though the article draws a contrast between algorithm based search engines (cold machines) and social based search engines (warm fuzzy humans), the former are built by humans and rely on data created and compiled by humans. I do know that many engineers (like me) pour their heart and soul into the systems they develop, so all systems have very human roots.

Google is very deeply ensconced in the market in ways which make it very difficult to dislodge in the short term. Adwords has deep roots all over the internet, their search works well enough, their applications are good enough, smart enough and da’gone’it people like them. Their share price strongly suggests that Wall Street is very confident that there is plenty of market share and revenue left for them to grow into.

Even if a Google ‘killer’ appeared on the scene, I don’t feel there is enough oxygen in the market for a new-comer to take them on. Even Microsoft, Yahoo and Ask, all of whom have deep pockets and roughly equal technology are slowly getting asphyxiated.

From a strategic point of view, an eye need to be kept on the following:

  • There will be a Google ‘killer’ at some point, but not in the short-term, and probably not in the medium-term. The chink in their armor has yet to reveal itself.
  • Any Google ‘killer’ has to be much, much better than Google just because people feel that their search engine is just ‘better’.
  • Google is not standing still, they will keep improving.
  • Google needs to pay attention to their market-share, making sure that healthy competition exists in the market otherwise the trust-busters will come calling. Google needs Microsoft, Yahoo, Ask, Amazon and EBay to keep competition healthy and keep everyone honest. This only helps the customer.

While I don’t feel there is currently much oxygen in the market for a Google ‘killer’, there is plenty of room for vertical search engines which combine web based & user generated content, provide tools to engage users such as blogs and forums, and some sort of commercial component.

Don’t get me wrong, I like Google, I use their products every day because they are good. It has been a very interesting ten years since they came on the scene, and the next ten years will prove an interesting ride, and I would not miss it for the world.

Update: SearchEngineLand also has a post on this with additional links.

Google Conference on Scalability

The Google Conference on Scalability is over and was very interesting.

The organization was first rate, and the food was very good too (I am an unapologetic  foodie). The presentations were also very good and very informative, most people were from the Seattle area and the San Francisco area, some from the East Coast, and some came in from as far as England.

The first keynote “MapReduce, BigTable, and Other Distributed System Abstractions for Handling Large Datasets by Jeff Dean, Google, Inc.” was interesting, all the material presented was already out there in the form of other presentations and papers, but Jeff really tied things together, and provided some good insight in to the tools and how they were used at Google. One very cogent point he made was that giving developers powerful tools allows them to be much more productive as well as allowing them to take on challenges that they otherwise could not. Both points are very important, and the first one really resonated with me, the less you have to worry about infrastructure as a developer the more you can focus on the problems at hand.

The session on the Lustre file system “Lustre File System by Peter Braam, Founder and President, Cluster File Systems, Inc.” talked about really big file systems, how you make them scale in the face of heterogeneity and unreliability.

Barry Brumitt’s presentation “Using MapReduce on Large Geographic Datasets” was entertaining and interesting, providing insight into the technology and processes that went into building Google Maps. Which got me thinking about whether you could adapt the technology to build maps of the sky. I am sure you could and I think it would be a very worthwhile project.

Reza Behforooz talked about “Lessons in Building Scalable Systems” (not listed) provided insight into the engineering process at Google, how they tested scaling and how they deployed systems. This was very interesting as I have not seen this information before.

The second keynote “Marissa Mayer, Vice President, Search Products & User Experience, Google, Inc., Topic TBD.” provided insight into user testing at Google, talking about how they tested various user interfaces, and the work they did on Universal Search. Nothing really new here, but there was much more content in the Q&A session afterwards.

The final talk I went to was “Challenges in Building an Infinite Scalable Datastore, Swami Sivasubramanian and Werner Vogels,” which proved to be very entertaining. Werner provided some interesting and amusing insights into the situation he found at Amazon when he got there, and how he addressed them. He draws a lot on biology for inspiration which make sense as a lot of the issues we run into with scaling have been solved a long time ago by biological organisms. Swami talked about some of the ways they dealt with scaling and reliability, which are detailed in an upcoming paper. He did the point that elegant academic solutions can be very difficult to implement, and the complex engineering that needs to be done to make thing work. Verner was a little miffed at the not-so-hidden recruiting pitches that Google were making so decided to make his own, very overt, recruiting pitch as well.

All the session were videotaped and are going to be put on on Google Video and/or YouTube in the near future, at which point I will post a link to them.

As for the recruiting pitches, both Google and Amazon sound like very interesting companies to work for, it is a shame that neither has a strong presence in the Boston area.

Updated June 27th, 2007, Robin Harris of StorageMojo has posted some notes too.

and you’re blogging…

“Here we are at the end of the universe… and you’re blogging…”