Caribbean Reef Octopus

This little guy is a juvenile caribbean reef octopus, smaller than my fist. I took quite a few shots, and I was able to get close enough to switch to macro mode on my camera. I was not more that 6 inches away. It got scared and dug itself into the sand under the rock right next to it.

For some reason the eye looks quite mesmerizing.


WordPress spam

For some reason I am starting to get a lot of comment spam from my blog, it started a few days ago and I am not sure why because usually I get no comment spam.

MySQL Proxy for sharding

I have been reading about various experiments using MySQL Proxy to handle sharding (and by extension scaling) for application by rewriting SQL queries as they come through and directing them to the appropriate shards.

The most visible project seems to be HScale, which is well worth looking at and reading about.

The premise is very compelling, which is to remove the issue of sharding from the application layer, moving it into the database layer. This makes the application less complex because it no longer needs to deal with sharding (though it could be argued that sharding, if correctly done, has very little ‘imprint’ on the application.)

I think this project has promise but there are some questions that needs to be addressed before it is really ready to be used in a production setting:

  • First is that the MySQL Proxy introduces a single point of failure. If it fails, the application stops. At the very least, there needs to be a number of proxies and the application needs to be able to detect when one has failed and switch over to another one. I suspect you could get around that issue with a load balancer.
  • Second sharding does not mean that your application automatically becomes fault tolerant. If you have more machines, the odds of one failing go up, so the proxy needs to be able to handle failing over from a failing server to a backup server.

Both of those are difficult problems to deal with, and like a lot of software projects it is the 20% that is going to take 80% of the time.

When in Rome…

I have been doing a lot of work parsing feeds (both RSS and ATOM) lately and have been using a tool called “Project ROME” for that. I know there is another tool called Abdera but that only handles ATOM feeds.

The ROME project page describes it as follows:

ROME is an set of open source Java tools for parsing, generating and publishing RSS and Atom feeds. The core ROME library depends only on the JDOM XML parser and supports parsing, generating and converting all of the popular RSS and Atom formats including RSS 0.90, RSS 0.91 Netscape, RSS 0.91 Userland, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom 0.3, and Atom 1.0. You can parse to an RSS object model, an Atom object model or an abstract SyndFeed model that can model either family of formats.

Which is what it does and it does it very well. I have thrown any number of feeds at it and it has performed very well. What I particularly like is the fact that foreign markup is accessible so any special tags like iTunes and Media RSS.

No tool is perfect and there are a few ‘lackings’ in it.

  • For some reason it does not support comment urls in items, I am not sure why this is the case since I would have expected it.
  • Some feeds contain some XSL/CSS directives located just before the feed itself, those are used to direct a browser to “pretty print” the feed when it displays it rather than raw XML. ROME does not like that at all and this stuff needs to be stripped from the feed before it is handed over for parsing.
  • Some feeds (like the NY Times, ahem…), have lots of null characters past the end of the feed, but which are part of the document. I suspect what is happening somewhere is that the feed is deemed to be longer than it actually is and the empty space is filled with null characters (let us pass on the existential issue of filling empty space with nulls). Those also need to be stripped out.

Unfortunately the last release was made in December 2006 and the project does not seem to have any work done on it since. Hopefully someone will step up to the plate and take it on, I might when work lets up. The one obvious thing I would do is add Generics to it.