Over the past few weeks I have been digging into Lucene and Solr for fun (or what passes for fun here!)

Lucene is a great toolkit. My high level impressions are that the search performance is pretty good but the tokenization and indexing performance is not that great, and I think the default ranking algorythms could use a little work, I got some strange results on some test searches I ran. Bear in mind that this is a toolkit and work is needed to get it running.

This is where Solr comes in, it provides a nice wrapper around Lucene, making it (relatively) easy to create and manage indices.

Luke is also worth checking out, it allows one to dig into a Lucene index, great for looking under the hood and understanding what is going on.

In addition to the Lucene and Solr web site, the IBM developer works has an article on Solr (part 1, part 2).


  1. noel says:

    Have you ever looked at mnogosearch? That’s what got used for the WiserEarth project. It was horribly complicated to configure. We had to hire the developer to help, and my overall impression of it was that it was very brittle. But after probably 100s of hours of tweaking, it does yield good result sets…

  2. I have heard of it but I have never looked at it, sounds like it may be more trouble that it is worth.

  3. thedjinn says:

    I have been using solr with little bit of tuning to search through 6.5 million records. It works beautifully.

  4. Lucene worked well in my test, as I said the search speed was very good. I was able to index about 200 million blog posts on a 32 bit linux machine at which point I started hitting some java limits. It did take a week to parse and index all that data though.

