HITS & PageRank

By way of Greg Linden, an article from Microsoft Research titled “HITS on the Web: How does it Compare?” by Marc Najork, Hugo Zaragoza, and Michael Taylor. Finally a large scale study of several ranking algorithms (HITS, BM25F) using web crawl data.

You should read Greg’s post as well the comments to the post, I am in general agreement with them.

There are a couple of points I would make though:

  • There is a tendency to think of Google’s ranking algorithm as PageRank, but I think it is now more (if not much more) than PageRank. The last time I talked to a googler, they told me that they used 126 signals (their term) to rank a page. Clearly they have gone beyond PageRank *.
  • I would have liked to see Lucene tested here too to see how well its ranking algorithm fares.

* I think Google went beyong PageRank pretty quickly since it is a well documented and well understood algorithm and therefore can easily be gamed.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: