Universal Search at Google

Interesting discussion by David Bailey and Johanna Wright about the new universal search at Google (via Greg Linden).

They talk about the challenge of searching lots of “additional content types”, I am guessing they mean indices.

One challenge was being able to regularly search through all of the additional content types to find relevant results. After all, you don’t know if there might be a minor news story or an obscure book relevant to your query unless you go and check. But Google’s massive compute cluster — and much effort by our infrastructure experts — gave us a leg up on that one, and we can now search these disparate types of information about as efficiently as we search our massive index of web pages. We may have melted down a data center or two along the way, but then bugs are part of life in this business!

Having a very large cluster would really help (I am guesstimating 500,000 machines now, with about 20% dead), and having the data replicated and distributed would also help, but I am not sure it would have been a major challenge. I would also guess that the “additional content types” take up much less space and therefore fewer resources to search than the main index of web pages.

Now comes the real challenge:

The next challenge was deciding when and where such results should blend in. Fortunately we have some of the world’s experts on ranking, and have been able to apply the lessons learned on web search to ensure that we show news only for newsworthy queries, scanned books only when there aren’t better web results, etc. It can be tricky. As we learned the hard way, just because everyone under the sun is writing about Anna Nicole Smith doesn’t mean news about her should show up for the search [baby names].

This is much more interesting, establishing relevance across disparate sources of information is very difficult. I have had some success with this across similar sources of information, by being able to get a relevance measure that was comparable across results sets.

Lastly:

Lastly, we faced the challenge of the user interface you see on the screen — the UI. The new UI for these results is subtle, but this is one reason why the project is fun for our designers and usability experts: they get to focus on creating a simple experience for you. For example, with news results they designed a compact look for the result that includes helpful items like an image and a date, but is limited to just the most salient information. Or take our book search results, which call out the author and number of pages in the book.

This is also very interesting, Google has traditionally had a very sparse user interface, which was a breath of fresh air when it came out. At the time, busy portals were all the rage and most of their user interfaces looked like angry fruit salads, not a pleasant experience. It was a bit like the time when laser printers first came out and people discovered fonts and had to use them all. If fact I inspired myself from the Google UI when I designed the UI to ScienceServer, clean and simple.

So incorporating all these new elements into the UI while keeping it simple would have been quite a challenge and I think the results looks really good.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: