Full Text Indices in MySQL

Interesting post about full text indices in MySQL:

I rarely use MySQL Fulltext indexes. Their performance is just not good enough, so often its better to just stick with “LIKE” or move to something else like Sphinx, Lucene etc. The only nice thing about them is the ability to compute a match “rank”. Well anyways I had to write a new search plugin for a project that is based around MySQL Fulltext indexes and a match rank and all as well .. except that for some reasons some words just would not produce any results. As I was trying to find a pattern I finally noticed that in my test data some words were used in most rows and exactly those were not matching. Obviously it makes sense to exclude automatically any words that have a very high hit ratio. And indeed the documentation states that by default all words with a hit ratio of over 50% are excluded. Doh!

The performance is indeed not very good. Way back in time Feedster started of using the MySQL full text index and performance really sucked, we gave up after having added about 1 million posts to the database (and handling about 5000 searches a day), and moved to the full text search engine I had written for the task (in all fairness Scott and I were still merging the two systems we had developed, we kept his U/I and database schema, and kept my crawler and full text search engine.)

We moved to a system where the indexer would pull recently added data from the database, index that and make it available for searching, effectively making MySQL a repository. Of course it makes much more sense to have the crawler queue up data for the indexer and bypass the repository completely.

Advertisements