UCSC-SOE-12-13: SciHadoop Semantic Compression

Adam Crume, Joe Buck, Noah Watkins, Carlos Maltzahn, Scott Brandt, Neoklis Polyzotis
08/16/2012 12:23 PM
Computer Science
Many scientific applications, when written in a MapReduce paradigm, naturally use grid coordinates as keys. Unfortunately, a straightforward representation of intermediate keys leads to an enormous amount of overhead. We show how grid coordinates can be stored compactly, yielding a significant reduction in data size. This is an important step in making MapReduce systems such as Hadoop more attractive for developers of scientific applications.