Microbiome compositional data are often high-dimensional, sparse, and exhibit pervasive cross-sample heterogeneity. We introduce the "logistic-tree normal" (LTN) model, a generative model that allows flexible covariance among the microbiome taxa, enables scalable computation, and effectively captures other key characteristics of microbiome compositional data such as the abundance of zeros. LTN incorporates a tree-based decomposition for effective aggregation over sparse taxa counts and models the relative abundance at the tree splits jointly using a (multivariate) logistic-normal distribution. The latent Gaussian structure allows a wide range of multivariate analysis and modeling tools for high-dimensional data-such as those enforcing sparsity or low-rank assumptions on the covariance structure-to be readily incorporated. As a general-purpose, fully generative model, LTN can be applied in a wide range of contexts, while at the same time, efficient computational recipes for Bayesian inference under LTN are available through conjugate blocked Gibbs sampling enabled by pólya-gamma augmentation. We demonstrate the use of LTN in a compositional mixed-effects model for differential abundance analysis through both numerical experiments and a reanalysis of the infant cohort in the DIABIMMUNE study. We explain and showcase through numerical experiments and the case study how LTN, through adequately accounting for the cross-sample heterogeneity, is capable of generating the appropriate proportion of zeros without incurring an explicit zero-inflation component. This confirms a recent viewpoint that "zero-inflation" in count-based sequencing data are often results of unaccounted cross-sample variation.
Wang et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: