UCSC-SOE-21-01: Distributed Bayesian Kriging

R. Guhaniyogi, C. Li, T.D. Savitsky and S. Srivastava
05/12/2021 12:42 PM
We propose a three-step divide-and-conquer strategy for fitting
Bayesian spatial process regression models that scales to massive data
sets. We partition the data into a large number of subsets, apply a readily
available Bayesian spatial process model in parallel on all the subset, and
optimally combine the posterior distributions estimated across all the sub-
sets into a pseudo posterior distribution that conditions on the entire data.
The combined pseudo posterior distribution replaces the full data posterior
distribution for predicting the responses at arbitrary locations and for in-
ference on the model parameters and spatial surface. Based on distributed
Bayesian inference, our approach is called \Distributed Kriging" (DISK)
and o ers signi cant advantages in massive data applications where the
full data are stored across multiple machines. We show theoretically that
the Bayes L2-risk of the DISK posterior distribution achieves the near op-
timal convergence rate in estimating the true spatial surface with various
types of covariance functions and provide upper bounds for the number of
subsets for achieving these convergence rates. The model-free feature of
DISK is demonstrated by scaling posterior computations in spatial process
models with a stationary full-rank and a nonstationary low-rank Gaussian
process (GP) prior. A variety of simulations and a geostatistical analysis
of the Pacifi c Ocean sea surface temperature data validate our theoretical