Finally Pinned Down my Java Issue

I finally pinned down the Java issue that I have been dealing with, documented in this post.

I finally hit on it when I was doing some work to lazy-load data in a ket object, and found that performance was good when I was part done and went to hell when I was fully done. Back-tracking and digging through the code, effectively cutting out pieces until I narrowed it down to this:

Set urlSet = new HashSet();
#Loop... {
 String href = "...";
 urlSet.add(new URL(href)); # This kills the JVM
}

What I am doing is extracting the urls from a piece of html and using the Java Set to de-duplicate them. For some reason the last statement causes the JVM to slow down to a point where it is just idling from the CPU’s point of view (2%-5% usage).

However this works fine as a work-around:

Set urlSet = new HashSet();
#Loop... {
 String href = "...";
 urlSet.add(href);
}

List urlList = new ArrayList();
for ( String href : hrefSet ) {
 urlList.add(new URL(href));
}

I am using the latest version of the JVM on Linux Centos 5.0, Dual 2.4GHz Xeon, 4GB RAM:

java version “1.6.0_10”
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) Server VM (build 11.0-b15, mixed mode)

Finally, and not least, thanks to a colleague for suggesting testing the code with strings as opposed to URLs.

Updated – my colleague pushed me to check the Java source code and it turns out that the java.net.URL class uses the java.net.URLStreamHandler.hashCode() method which does a DNS lookup on line 337 to work out the hash code:

InetAddress addr = getHostAddress(u);

Nice…

Advertisements

2 Responses to Finally Pinned Down my Java Issue

  1. Yoav Shapira says:

    Oh man! I should have recognized this when I read your blog post. I could have saved you some time and frustration. Sorry.

    The java.net.URL problem is well-known. Basically you should never use it as a key in a collection, because its equals() and hashCode() functions are too slow to be useful.

    Workarounds include using the URI class, which is nice and fast, or a String like you noted.

  2. Thanks. Actually I have been tracking this down for a while and the two people I showed it to had no idea (in their defense I think they probably discounted my making such a mistake.) I am glad I was able to solve it in the end, can’t think what Sun was thinking putting that in the URL class. I have checked other places where I used URL Sets and changed that.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: