HITS & PageRank
By way of Greg Linden, an article from Microsoft Research titled “HITS on the Web: How does it Compare?” by Marc Najork, Hugo Zaragoza, and Michael Taylor. Finally a large scale study of several ranking algorithms (HITS, BM25F) using web crawl data.
You should read Greg’s post as well the comments to the post, I am in general agreement with them.
There are a couple of points I would make though:
- There is a tendency to think of Google’s ranking algorithm as PageRank, but I think it is now more (if not much more) than PageRank. The last time I talked to a googler, they told me that they used 126 signals (their term) to rank a page. Clearly they have gone beyond PageRank *.
- I would have liked to see Lucene tested here too to see how well its ranking algorithm fares.
* I think Google went beyong PageRank pretty quickly since it is a well documented and well understood algorithm and therefore can easily be gamed.





