François Schiettecatte’s Blog

Technorati stumbles

Posted in Feedster, Search by François Schiettecatte on August 17, 2007

It looks like Technorati has just stumbled.

I am sorry to hear that things are not going well for them, but they are in a tough market, especially since Google got into it with their blogsearch.

Disclaimer – I am stockholder in Feedster, and still consult with them from time to time.

The Importance of being cached

Posted in Feedster, Search by François Schiettecatte on July 18, 2007

By way of Greg Linden, I read this very interesting paper from Yahoo Research about caching called “The Impact of Caching on Search Engines“.

I liked the dicussion on term versus search caching. My experience is that term caching does not really buy you much if all you are doing is caching a posting list since that is what is stored in the index. Caching terms would make more sense if there is a field restriction on the term, but most terms don’t have field restrictions. Caching a search makes a lot more sense, and caching portions of searches also makes a lot of sense. In the search engine I developed for Feedster, I implemented both. The searches were cached, and the filters in searches were also cached. By filters I mean that we had a number of searches which were restricted to a reduced set of weblogs and these restrictions were implemented using a filter expression which was separate from the actually user search. This is pretty standard stuff, and I found that caching the filter results improved performance.

I am not sure where I stand on dynamic versus static caching though. I am not sure I make much of a distinction, I implemented a dynamic cache, ie I would cache the results if they were not already cached, but I did not set a limit to the cache, and I did not ‘warm’ the cache from search logs.

Chad Walters also has some interesting thoughts on this.

The “We’re sorry…” page

Posted in Feedster, Search, Software Development by François Schiettecatte on July 10, 2007

Google has an interesting way of dealing with people who launch too many searches from their computers.

While working at Feedster I routinely saw the same search coming in many times a second, usually from the same subnet or the same computer. In some cases these were due to crawlers gone wild, but in other cases the pattern looked malicious.

What we did to handle that was to put in place a way of measuring the search rate coming in and looking for hight numbers of searches from the same subnet or the same computer. When we could contact an admin to check into it we did so, but if there was no contact, we initially throttled the searches, effectively putting in a delay. If that did not help, we would just cut off the IP address.

I remember two patterns to the searches, one was that a lot of them came from China, and the other is that a lot of them were looking for kiddie porn.

I think Google handles this problem better than we did at Feedster, then again they have a lot more engineers.

New look at Feedster

Posted in Feedster by François Schiettecatte on June 28, 2007

Feedster has a new look and new features, channels and widgets, allowing you to build what they are calling feedwidgets for your blog and/or web site.

I personally know most of the people who worked on this new design and features, props to them for putting this together.

MySQL Scaleout

Posted in Feedster, Scaling by François Schiettecatte on June 21, 2007

Some notes on MySQL and Wikipedia, sorta thrown in with the MySQL 12 days of Scale-Out.

These notes are very high level but they are very good nonetheless, the way you scale MySQL is to partition your data, Wikipedia slices by:

  • data segments
  • tasks
  • time

That is the only way to go, the approach I take is that if you can’t slice and dice your data, it will not scale.

At Feedster the data was very time dependent, old posts, new posts, etc… people want to see the new stuff and are not so keen on the old stuff, so we sliced up the posts into segments and delta segments, lower numbered segments contained older posts, higher numbered segments contained newer posts, and we could replicate those segments across servers.

Death Of Froogle

Posted in Feedster, Search by François Schiettecatte on April 19, 2007

It is nice to see that Froogle has morphed into Google Product Search.

It wasn’t the best name one could have thought up.

A little bit of trivia, Feedster used to be called Roogle, a sort of mashup between RSS and Google, but that was changed before Scott and I joined forces. Which was a good thing I think since it would have probably attracted attention from Google’s lawyers.

ps – Roogle now seems to be owned by a real estate company.

Search In China

Posted in Feedster, Search by François Schiettecatte on April 4, 2007

Aydin Senkut has written an article titled “The Outlook for Search in China” over on Read/Write Web. It is a very good overview of the current search landscape in China as well as trends there.

Aydin Senkut is also an advisor to Feedster.

OpenSearch

Posted in Feedster, Search by François Schiettecatte on April 3, 2007

Early on in Feedster’s life, we had Michael Fagan doing an internship at the company. Among the various things he worked on, he added support for A9’s meta search format (basically RSS with some extensions).

I was looking around the A9 web site for information about it and found that it had been forked out to OpenSearch.org.

Turns out that Micheal continued to work on it.

It looks like it has evolved into a very functional spec for supporting meta searching.

Lessons Learnt – Aim in Front of the Target

Posted in Feedster by François Schiettecatte on January 25, 2007

While working at Feedster, I was originally responsible for the crawler, the indexer and the search engine. Since I knew nothing about crawlers, I still don’t but I know more about them now than I did then, I handed that over to people who were much more competent than me in that area.

So I was left looking after the indexer and the search engine. Initially, back in 2003, our traffic was very low but it kept increasing. As time went on, I would have to re-architect the indexer and the search engine to be able to deal with the increasing amount of data we were crawling and with the increasing number of searches we were getting. The trick to aim for the growth we were anticipating in six to twelve months time and not the growth we were anticipating in two to three months time. That way, the indexer and the search engine were able to deal with traffic growth very easily and without causing issues.

The other lesson learnt here was to aim for simplicity. You want to make the data administration easy when you are dealing with large amounts of data, and increasingly large amounts of data. You want to make the system robust so when machines crash, or networks go down, or power goes out, whatever is developed has to deal gracefully with system degradation and has to recover on its own when whatever when down comes back up.

All Good Things…

Posted in Feedster by François Schiettecatte on January 22, 2007

All good things end, and last friday I left Feedster after spending four years there.

Feedster was founded in early 2003 during a particularly tenebrous Boston winter. Scott Johnson and myself had started to build feed search engines independently of each other. After about a month of doing this, we started an email conversation one friday evening and decided to meet for lunch to talk about what we were doing.

We talked on and off for a few weeks and it became clear to both of us that there was something there, that neither of us could got it alone and that we both had different strengths, so we decided to join forces and Feedster was truly born.

Now after four years of work, it is time for me to move on. Working on Feedster has been a lot of fun, I have been privileged to work with a lot of very smart people, and I have learnt a lot along the way.

Feedster has been through a lot of changes over the years, and the current team has been in place for just over a year. It is a very strong team and I will miss working with them a lot. I have a very deep respect for them and I have made many good friends along the way.

I will be remaining as an advisor to Feedster for the foreseeable future, so I will be in touch with them on a regular basis. And I know that Feedster will go onto greater and greater things.

For my part, I have no immediate plans, save taking some much needed vacations and batting around some ideas that have been running around my head.