Scaling Without A Database

Before I wrote up an article entitled “Scaling and Uptime”, I came across another article by Frank Sommers entitled “Scaling Without A Database” which is well worth reading. It is a take on another article by Robert McIntosh entitled “Building a high volume app without a RDMS or Domain objects.”

Both articles are well worth reading and address the interesting issue of scaling without a database. I quote:

McIntosh’s basic thesis is centered around three observations. The first one is that true scalability can best be achieved in a shared-nothing architecture. Not all applications can be designed in a completely shared-nothing fashion—for instance, most consumer-facing Web sites that need the sort of scaling McIntosh envisions require access to a centralized user database (unless a single sign-on solution is used). But a surprising number of sites could be partitioned into sections with little shared data between the various site areas.

This just comes back to what I was saying earlier about partitioning data to avoid bottlenecks. I think it would close to impossible to build a site that did anything worthwhile without having some sort of database system behind it, but that does not mean that scaling is held hostage by that database. Partitioning is the key, both vertical and horizontal.

McIntosh’s final observation is that although modern Web frameworks speed up development already, a new level of rapid development can possibly be reached by managing data in plain files, such as XML:

Well this seems obvious at first glance. If you are just going to use text files to store data, accessing that data is going to be very fact since you are not dealing with the overhead incurred if you stored that data in an RDBMS. The flip side of this is that your application suddenly has to deal with parsing said text files, organizing said text files and finding said text files. All you have really done there is shifted where the work gets done, you have not really escaped the fact that it needs to get done.

Another reason to entertain some of McIntosh’s notions is that quick access to large amounts of data occurs through indexes—be those indexes managed by a relational database or indexes created ex-database, such as with Lucene. An application relying on, say, XML-based files for data storage could generate the exact indexes it needs in order to provide query execution capabilities over the files. And, in general, ex-database indexes have proven more scalable than database-specific indexes: Not only can such indexes be maintained in a distributed fashion, they can also be custom-tailored to the exact retrieval patterns of an application.

I would concur with this and point out that this is not really that new. I worked on a system in 1993 which used Sybase to store data, and extracted that data to be indexed by a full text search engine. When Feedster was first started, the search engine used was the MySQL full text search engine that was built into the MySQL server (in 2003.) Once we reached 1 million posts, the whole RDBMS became really slow, at which point we split the full text searching out from the RDBMS, and the system performed well again. The main message here is that you have to build systems leveraging the strengths of each component and not be afraid to bring in more specialised components if they get the job done better.

Advertisements

4 Responses to Scaling Without A Database

  1. I never said any of my thoughts were ground breaking :-) After reading a few comments to my post, I realized that what I was trying to convey didn’t quite reach everyone in the way I wanted. I have no issues with relational databases. Don’t hate them, don’t dislike them. To be honest, what I don’t like are tools like Hibernate that try to shield you from said databases. After all, SQL is pretty darn powerful and isn’t hard for what most people need to do.

    Everything that you point out in this post is completely valid. Yes you have to process that XML in some way, yes you need to partition correctly and appropriately. None of that changes with or wihtout a RDMBS (as you mention in your last paragraph). The overall theme of my post was more about do we NEED a relational DB for EVERY app we build. Are there other alternatives that can simplify things? That is where I was going.

    With that in mind, I’m actually putting my hypothesis to the test very soon and I’ll write about it along the way.

    Regardless, thanks for the comments. You bring up good points that I didn’t touch on.

  2. I agree with you completely, we don’t need an RDBMS for every project. If you are going to store key/value pairs, I think that BerkeleyDB is perfectly adequate, and if you are really into SQL, then SQLite may be a good solution.

    And thank you for your comments.

  3. Jim says:

    There are some web framework could help to achieve this goal. Nenest is such a tool, with which you don’t need host a database on your site, but just use its API, from IFRAME tag to HTML API, and XML API, integrate the data with your pages.

  4. Jim

    Aside from the sales pitch (I still approved the comment since I thought Nenest was an interesting idea,) what you are doing there is moving the storage from the web site in question to another place, data still has to be stored somewhere, and any web site which uses Nenest becomes dependent on its availability and scaling capabilities, so the bottleneck problems may well still remain.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: