Friday, November 15, 2013

Memory vs Disk for Data Platforms

Some interesting charts. I wrote this in response to a colleague asking my thoughts on a paper Intel released.

Graph of memory and Disk prices:

Corporate Data Growth:

Of the two charts above, only one can be represented in a browser without resorting to the use of a logarithmic scale.

I had to copy 10Gb of files to a spinning disk last night on my desktop (something now to be avoided whenever possible). It went at a blistering 10Mbs. The NIC in the machine operates at 1Gbs, my Internet connection is 75Mbs. Something is wrong here.

Cool new in memory projects that are gaining momentum:

The important chart:

After looking at the evidence I'll comfortably make the assertion that disk is dying as a medium for anything other than archival storage. This is a different strategy than cache optimization of RDBMS and related technologies. However, optimizing code and algorithms to avoid Cache misses is stil cool and useful.

Because corporate data growth is progressing slower than memory is getting cheaper and more plentiful, it makes sense to seriously evaluate using architectures that have big enough memory to hold the entire data sets.

I suspect in the foreseeable future memory will creep closer to the cores (much bigger caches) or the cores will creep closer to the memory (new architectures?). Although this hasn't seemed like something folks have been looking into because of the current lack of software written to run well with these new capabilities.