UCSC-SOE-20-03: Bayesian Dynamic Feature Partitioning in High-Dimensional Regression with Big Data

Rene Gutierrez and Rajarshi Guhaniyogi
06/21/2020 09:30 PM
Bayesian computation of high dimensional regression models using Markov Chain Monte Carlo (MCMC) or its variants is too slow or completely prohibitive since these methods perform costly computations at each iteration of the sampling chain. Furthermore, this computational cost cannot usually be efficiently divided across a parallel architecture. These problems are aggravated if the data size is large or data arrive sequentially over time (streaming or online settings). This article proposes a novel dynamic feature partitioned regression (DFP) approach for efficient online inference for high dimensional regressions with large or streaming data.
DFP constructs a pseudo posterior density of the parameters at every time point, followed by quickly updating the pseudo posterior when a new block of data (data shard) arrives. DFP updates the pseudo posterior at every time point suitably and partitions the parameter space to exploit parallelization for efficient posterior computation. The proposed approach is applied to high dimensional linear regression models with Gaussian scale mixture priors and spike and slab priors on large parameter spaces, along with large data, and is found to yield state-of-the-art inferential performance. The algorithm enjoys theoretical support with pseudo posterior densities over time are arbitrarily close to the full posterior as data size grows.