HITS & PageRank

By way of Greg Linden, an article from Microsoft Research titled “HITS on the Web: How does it Compare?” by Marc Najork, Hugo Zaragoza, and Michael Taylor. Finally a large scale study of several ranking algorithms (HITS, BM25F) using web crawl data.

You should read Greg’s post as well the comments to the post, I am in general agreement with them.

There are a couple of points I would make though:

  • There is a tendency to think of Google’s ranking algorithm as PageRank, but I think it is now more (if not much more) than PageRank. The last time I talked to a googler, they told me that they used 126 signals (their term) to rank a page. Clearly they have gone beyond PageRank *.
  • I would have liked to see Lucene tested here too to see how well its ranking algorithm fares.

* I think Google went beyong PageRank pretty quickly since it is a well documented and well understood algorithm and therefore can easily be gamed.


