Yet more on “The Database Column vs. MapReduce”

The Database Column follows up on their original article about DBMSs and MapReduce.

Overall I think they do a very good job of addressing people’s comments about the issues. But I am left with one nagging thought triggered mainly by this sentence in the article:

As such, we claim that most things that are possible in MapReduce are also possible in a SQL engine.

The nagging thought is this: Are we trying too hard to solve very different problems with two very different technologies by claiming that one technology can do it all. The authors present a very good example which is hard to do with MapReduce but easy to do with an RDBMS. What they don’t present is the flip side.

I would suggest that parsing text and indexing it into an index for full text searching is a very good example of something that works really well in MapReduce, but would absolutely “suck” in an RDBMS.

Processing large amounts of log data would be another good example.

And I am sure there are plenty of others.

I would suggest that a good rule of thumb would be to look closely at the data you are processing. If there are lots of joins, or even one, an RDBMS would probably be a good choice. If your data is flat, then MapReduce would probably be a good choice. I hedge here because every situation is different and has to be considered on its own challenges and merits.

The bottom line is that someone who comes to me and says that “Language X” or “Tool Y” can do it all (and make me a cup of tea after that) is being a little jejune.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: