Testing Hadoop

I have been installing and testing Hadoop, which is a software platform lets one easily write and run applications that store and process vast amounts of data.

I got the install working on my CentOS based machine and on MacOS X, and the best starting point is the Getting Started With Hadoop page on the Hadoop Wiki.

I did run into some configuration issues:

  • Turn off your firewalls when installing the software, it makes life a lot easier, you can turn then back on when everything is working.
  • The data path settings should be set. It is best to mirror the directory structure set out in the default configuration, but to put them on a file system with lot of space, and then link to the root of that directory structure from something like ‘/var/lib/hadoop’.
  • The installation and data directories need to be the same on all machines in your cluster otherwise things will not work, hence the recommendation about the data path settings above. This will also help you keep the site configuration files mostly identical.

So far things work well and I am impressed that it installed on MacOS X, then again there was no reason why it shouldn’t.

This document over on Amazon is a very good primer on running Hadoop MapReduce (ignore the parts about EC2and S3).


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: