iPod Touch/iPhone memory bump

It was nice to see Apple bump the memory on the iPod Touch and the iPhone.But I am holding out for a memory-bumped iPod Nano. I currently have a 8GB iPod Nano (the previous generation) and I typically upgrade when I can double the amount of storage on the device.


Stop words and minimum term length

This post on stop words and minimum term length by Peter Zaitsev reminded me of some search engines do’s and don’ts that I posted back in August last year.

To summarize:

Stop lists are evil, don’t use them, modern machines have enough capacity to index, store and search over very large quantities of text. Typically I have found that there is only a 5% difference in index size if you add a stop word list.

There should be no minimum term length, you want to be able to search for “Vitamin A”.

Case is important. The approach I take is to index all terms in lowercase, and also index mixed case terms as they are. Search is always done in the case supplied by the user, so “New York Times” would only find documents which contain capitalized terms, and “new york times” would find all documents which contain the terms regardless of case and capitalization.

Tokenization is important, check my original post on that.

Plural stemming is the way to do, any more (like Porter or Lovins) will just increase the ‘noise’ in the search results.

There is more in the post and I should revise it sometime, maybe this weekend.