UCSC-SOE-19-08: Distributed Implementation of Nearest-Neighbor Gaussian Processes

Isabelle Grenier and Bruno Sanso ́
09/12/2019 04:30 PM
While many statistical approaches have tackled the issue of large spatial datasets, the issue arising from costly data movement and data storage have long been set aside. Having an easy access to the data has been taken for granted and is now becoming an important bottleneck in the performance of statistical inference. As the availability of high resolution spatial data continues to grow, the need to develop an efficient modeling technique is becoming a priority. In this paper, we develop a distributed method for the Nearest-Neighbor Gaussian Process (NNGP) models as a solution to large datasets. The framework that we propose retain the exact implementation of the NNGP while allowing for a parallel computation of the posterior inference. The method allows for any choice of grouping of the data whether it is at random or by region. As a result of this new method, the NNGP model can be applied to a dataset with n observations split into J servers with computations of order n/J.