François Schiettecatte’s Blog

February 1, 2008

Yet more on “The Database Column vs. MapReduce”

Filed under: Scaling, Search, Software Development — François Schiettecatte @ 10:26 am

The Database Column follows up on their original article about DBMSs and MapReduce.

Overall I think they do a very good job of addressing people’s comments about the issues. But I am left with one nagging thought triggered mainly by this sentence in the article:

As such, we claim that most things that are possible in MapReduce are also possible in a SQL engine.

The nagging thought is this: Are we trying too hard to solve very different problems with two very different technologies by claiming that one technology can do it all. The authors present a very good example which is hard to do with MapReduce but easy to do with an RDBMS. What they don’t present is the flip side.

I would suggest that parsing text and indexing it into an index for full text searching is a very good example of something that works really well in MapReduce, but would absolutely “suck” in an RDBMS.

Processing large amounts of log data would be another good example.

And I am sure there are plenty of others.

I would suggest that a good rule of thumb would be to look closely at the data you are processing. If there are lots of joins, or even one, an RDBMS would probably be a good choice. If your data is flat, then MapReduce would probably be a good choice. I hedge here because every situation is different and has to be considered on its own challenges and merits.

The bottom line is that someone who comes to me and says that “Language X” or “Tool Y” can do it all (and make me a cup of tea after that) is being a little jejune.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

You must be logged in to post a comment.

Blog at WordPress.com.