ABSTRACT High‐dimensional data is becoming increasingly prevalent in scientific fields such as genomics, economics, and medicine. However, recent studies have indicated that such data is often heterogeneous. Most existing research focuses either on high‐dimensional multi‐source data or on isolated heterogeneous datasets, leaving a significant gap for joint modeling and inference of high‐dimensional multi‐source heterogeneous data. In this paper, we employ a Bayesian method to address the estimation problem associated with high‐dimensional multi‐source heterogeneous data, aiming to extract shared features across all subpopulations while also examining the unique heterogeneity within each subpopulation. We introduce a scalable and interpretable Bayesian model for multi‐source heterogeneous linear data that employs a sparsity‐inducing spike‐and‐slab prior, featuring a Laplace slab and a Dirac spike. To address the computational challenges associated with the posterior, we implement a mean‐field variational approximation that utilizes a factorizable family of spike‐and‐slab distributions. Our method overcomes the high computational cost of Gibbs sampling while preserving valuable features, providing a posterior distribution for the parameters and offering a natural mechanism for variable selection via posterior inclusion probabilities. Through simulation studies and an application to five cancer datasets in real‐world the Cancer Genome Atlas (TCGA), we demonstrate the effectiveness of our variational Bayesian approach. Our results show the advantages of our method in terms of computational efficiency and scalability compared to Gibbs sampling method and penalized frequentist methods for integrative analysis, making it well‐suited for analyzing high‐dimensional multi‐source heterogeneous data. The proposed variational Bayesian algorithms have been implemented in the R package VBMS, which is publicly available on CRAN.
Liu et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: