UCSC-SOE-18-11: Toward Efficient Data Prefetching Algorithms for Scientific Domains

Reza NasiriGerdeh, Carlos Maltzahn, Frank Wurthwein, Brian Bockelman, Matevz Tadel, Michael A. Sevilla
05/17/2018 10:07 PM
Computer Science
Recent increase in computing capability of commercial clouds has motivated scientific communities to transition their applications and infrastructures to the cloud. Data prefetching plays a crucial role in this new cloud-based infrastructure in which client applications access large volumes of scientific data resident at the remote storage system. Because each scientific domain has its own data format and its own client applications, the common prefetching algorithms such as those based on byte streams do not work efficiently for every scientific domain. In this paper, we argue that efficient data prefetching algorithms for specialized scientific use cases can be designed using domain-specific knowledge. We propose a novel prefetching algorithm for a particular scientific domain, which leverages domain-specific knowledge relating to the logical layout and the transfer unit of data. We illustrate the efficiency of our algorithm via simulation and accuracy and recall metrics.

