Shuangjie Zhang, Yuning Shen, Irene A. Chen, Juhee Lee
05/14/2024 11:53 AM
Statistics
Group factor models have been developed to infer relationships between multiple co-occurring multivariate continuous responses. Motivated by complex count data
from multi-domain microbiome studies using next-generation sequencing, we develop a sparse Bayesian group factor model (Sp-BGFM) for multiple count table data that
captures the interaction between microorganisms in different domains. Sp-BGFM uses a rounded kernel mixture model using a Dirichlet process (DP) prior with lognormal
mixture kernels for count vectors. A group factor model is used to model the covariance matrix of the mixing kernel that describes microorganism interaction. We
construct a Dirichlet-Horseshoe (Dir-HS) shrinkage prior and use it as a joint prior for factor loading vectors. Joint sparsity induced by a Dir-HS prior greatly improves
the performance in high-dimensional applications. We further model the effects of covariates on microbial abundances using regression. The semiparametric model flexibly
accommodates large variability in observed counts and excess zero counts and provides a basis for robust estimation of the interaction and covariate effects. We
evaluate Sp-BGFM using simulation studies and real data analysis, comparing it to popular alternatives. Our results highlight the necessity of joint sparsity induced by
the Dir-HS prior, and the benefits of a flexible DP model for baseline abundances.