Recently I asked a colleague about Java profilers. Four years ago I worked on a Java project (my first) and am currently working on my second one. Profiling was not really an issue on the first project but I am hitting some performance issues with this current one.
He pointed me to to YourKit which looks very full featured, it is also not cheap at $500/license.
I did a little bit of looking around and found JRat, a command line profiling tool which seems to work well for command line applications. You use JRat Desktop (a Swing application) to look at the profile and coverage information, and it does a good job of laying out where time is being spent in the application.
What it is telling me now is that the Java MySQL connector is a real performance hog.
I have been doing a lot of work parsing feeds (both RSS and ATOM) lately and have been using a tool called “Project ROME” for that. I know there is another tool called Abdera but that only handles ATOM feeds.
The ROME project page describes it as follows:
ROME is an set of open source Java tools for parsing, generating and publishing RSS and Atom feeds. The core ROME library depends only on the JDOM XML parser and supports parsing, generating and converting all of the popular RSS and Atom formats including RSS 0.90, RSS 0.91 Netscape, RSS 0.91 Userland, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom 0.3, and Atom 1.0. You can parse to an RSS object model, an Atom object model or an abstract SyndFeed model that can model either family of formats.
Which is what it does and it does it very well. I have thrown any number of feeds at it and it has performed very well. What I particularly like is the fact that foreign markup is accessible so any special tags like iTunes and Media RSS.
No tool is perfect and there are a few ‘lackings’ in it.
- For some reason it does not support comment urls in items, I am not sure why this is the case since I would have expected it.
- Some feeds contain some XSL/CSS directives located just before the feed itself, those are used to direct a browser to “pretty print” the feed when it displays it rather than raw XML. ROME does not like that at all and this stuff needs to be stripped from the feed before it is handed over for parsing.
- Some feeds (like the NY Times, ahem…), have lots of null characters past the end of the feed, but which are part of the document. I suspect what is happening somewhere is that the feed is deemed to be longer than it actually is and the empty space is filled with null characters (let us pass on the existential issue of filling empty space with nulls). Those also need to be stripped out.
Unfortunately the last release was made in December 2006 and the project does not seem to have any work done on it since. Hopefully someone will step up to the plate and take it on, I might when work lets up. The one obvious thing I would do is add Generics to it.
Recently I have been working on parsing RSS and ATOM feeds, as part of a larger project, and found ROME to be a pretty good toolkit to do that.
JavaWorld has a good article on how to use ROME.
ROME does a good job of hiding the little idiosyncrasies that differentiate different feed types and it also provides a mechanism to access foreign markup in the feeds like the Media RSS elements.