March 30, 2008 2 Comments
By way of Greg Linden, I just finished watching a tech talk given by Professor Gene Cooperman on “Disk-Based Parallel Computation, Rubik’s Cube, and Checkpointing“.
The part that interested me was Cooperman’s assertion that “disk is the new RAM” if you have enough machines in your cluster, reaching a point where the aggregate bandwidth of the disks reach that of RAM.
The first obvious thing that jumps is that this does not make any sense, while the bandwidth may be the same, the latency is quite different, and will be a performance killer.
Cooperman recognizes that and makes the point that you need to organize your disk accesses to minimize latency, basically avoiding piecemeal reading and going for batch reading. Effectively what you are doing is shuttling data to and from memory in very large batches. (As an aside this is nothing new, the Connection Machine 5 had a similar disk system and I am sure there are other such systems out there).