Lessons Learned – Long Life

Anything you do will have a longer life than you imagine. Hard to see when you whip up some table or some code thinking “this only needs to last us two weeks.”

Believe me when I say that it will likely still be running in six month’s time if not more. So it is worth taking the time to do it right, even if you need to do it twice or thrice. I have frequently put something together only to step back and see a better way of doing it once it is done. This is normal since there are almost always unknowns when you first start on something.

It is always best to take the time to do it right, rather than live with a botch for a long time.

Lessons Learned – It Will Break

When you are heads down in code, building up your new, all singing, all dancing site, it is easy for forget that hardware is not your best friend, especially when you are at the very start of a new venture and you are strapped for capital (so don’t have lots of hardware sitting around for backups.)

Hardware will break, especially disk drives. I know manufacturers list great MTTFs, but if you have an application which pushes your disk drives, like a dbms, or crawlers, or text processing, then that disk will eventually break. In the first year of Feedster, we broke half our disk drives. Granted we were leasing our hardware and it was not the best hardware, but you get the point.

The only answers here are backups and redundancy. I know both are tedious and consume time, but it will be a worthwhile investment when that critical piece of hardware breaks.

Digitizing Video

I have been using the eyetv 250 from elgato to digitize my VHS video tapes to good effect, the quality is pretty good, which really depends more on the quality of of the source material to begin with and the quality of the VHS tape player.

The only downside I can see right now is that the marker control for cutting out the unwanted parts of recordings don’t really allow for very fine control.

Sploggers

Robert Scoble suggests that we get rid of sploggers. Hard to disagree with that.

Right now about half, if not more, of the ping notifications we get either directly or from others are from sploggers wanting us to index their content.

Sploggers have also gotten very good at generating lots of splogs, either hosting it themselves, or hosting it on weblog providers. Generation is done through software of course, I have come across splogging software which make it very easy to generate large amounts of splog, automatically pinging all the weblog/RSS search engines, and dissimulating the content either by interspersing the splog posts with genuine posts harvested from genuine blogs (like Robert Scoble’s blog), or from search engines like Feedster, or by adding random text either in English or a mix of languages.

The problem is compounded by the fact that there are a lot of rebloggers which aggregate feeds (all very legitimate), along with content sites which allow you to automatically blog articles (also very legitimate), all of which generates lots of duplication.

Finally someone suggested that you could:

look for ‘old’ ontent that’s being excerpted with a high link to content ratio where the links don’t share a lot in common with the content

This is really difficult. How do you tell which is the original content? How do you tell whether the high link to content ratio is slogging or real? The very nature of blogs generates lots of links.

This is a very difficult problem to solve, and I don’t believe that there is a simple solution. There are a number of approaches to dealing with this, none of which are perfect and none of which will fully get rid of splogs, and all of which require considerable resources.

Top 100+ List

Feedster made the top 100+ list over on John Battelle’s blog.

Winner-Takes-All?

Rich Skrenta’s post about Google and the Third Age of Computing is starting to make the rounds, here and here.

It would be foolish to ignore Google, it is after all the 800 pound gorilla on the block at this time, but others have been there before, IBM & Microsoft to use Rich’s example, but they are still around and are still forces to be reckoned with.

And to single out Google as the only player would be to ignore everyone else out there. It would be like showing up on the plains of Africa, rightly concluding that the elephant was the largest animal there and ignoring the rest of the ecosystem around you. I would suggest that termites are the dominant player out there and not elephants. I certainly saw more termite hills than elephants when I was out there.

I also saw a lot of volcano chimneys there, which were probably there before humans became bipedal.

Dirigisme Vs. Laissez-Faire

I just came across this piece of news that the Germans have just withdrawn from an agreement with France to develop a search engine called Quaero (Wikipedia).

I have been hearing about this effort on and off for over a year now, Quaero having been announced as an initiative in April 2005. Yet here we are in 2007 and still nothing has shipped.

One has to wonder if the parties found it hard to work together, or whether the split happened because nothing else was.

The whole idea of state sponsored projects to compete with the market is misguided. When the state gets involved in this manner, all major incentives for shipping anything are taken away, so progress, if any, will be very slow.

NY Times on Search

Very interesting article on search in the NY Times.

I especially liked this quote from Esther Dyson:

Ms. Dyson, the technology commentator and Powerset investor, captured the optimism more concisely and with less swagger. “I love Google,” she said, “but I love the march of history.”

Follow

Get every new post delivered to your Inbox.