Crawling is indeed harder than it looks
May 10, 2008 Leave a comment
I wrote the version one crawler for Feedster (version zero was not very good and got ditched very quickly) and it is very difficult to write a good crawler. It is basically a balancing act, currency versus bandwidth usage, etc…
I finished writing a crawler a month or so ago for the current project I am working on and it took me a while to adjust the crawl interval based on how frequently a feed changed. I am not sure I have it quite right yet and the algorithm still needs more adjustment.