The Importance of being cached

By way of Greg Linden, I read this very interesting paper from Yahoo Research about caching called “The Impact of Caching on Search Engines“.

I liked the dicussion on term versus search caching. My experience is that term caching does not really buy you much if all you are doing is caching a posting list since that is what is stored in the index. Caching terms would make more sense if there is a field restriction on the term, but most terms don’t have field restrictions. Caching a search makes a lot more sense, and caching portions of searches also makes a lot of sense. In the search engine I developed for Feedster, I implemented both. The searches were cached, and the filters in searches were also cached. By filters I mean that we had a number of searches which were restricted to a reduced set of weblogs and these restrictions were implemented using a filter expression which was separate from the actually user search. This is pretty standard stuff, and I found that caching the filter results improved performance.

I am not sure where I stand on dynamic versus static caching though. I am not sure I make much of a distinction, I implemented a dynamic cache, ie I would cache the results if they were not already cached, but I did not set a limit to the cache, and I did not ‘warm’ the cache from search logs.

Chad Walters also has some interesting thoughts on this.


