“Disk is the new RAM”, or is it the other way around

By way of Greg Linden, I just finished watching a tech talk given by Professor Gene Cooperman on “Disk-Based Parallel Computation, Rubik’s Cube, and Checkpointing“.

The part that interested me was Cooperman’s assertion that “disk is the new RAM” if you have enough machines in your cluster, reaching a point where the aggregate bandwidth of the disks reach that of RAM.

The first obvious thing that jumps is that this does not make any sense, while the bandwidth may be the same, the latency is quite different, and will be a performance killer.

Cooperman recognizes that and makes the point that you need to organize your disk accesses to minimize latency, basically avoiding piecemeal reading and going for batch reading. Effectively what you are doing is shuttling data to and from memory in very large batches. (As an aside this is nothing new, the Connection Machine 5 had a similar disk system and I am sure there are other such systems out there).

Advertisements

2 Responses to “Disk is the new RAM”, or is it the other way around

  1. noel says:

    It doesn’t seem practical or even possible to always organize your reads into non-random accesses. At least not in an rdbms context.

  2. I agree with you that this is not possible in an RDBMS context, this is really aimed at tackling problems where you need to process data in large chunks, like log files or text for example.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: