Dissecting Microbial Community Structure and Heterogeneity via Multivariate Covariate-Adjusted Clustering
By: Zhongmao Liu , Xiaohui Yin , Yanjiao Zhou and more
Potential Business Impact:
Finds gut bacteria groups linked to health.
In microbiome studies, it is often of great interest to identify clusters or partitions of microbiome profiles within a study population and to characterize the distinctive attributes of each resulting microbial community. While raw counts or relative compositions are commonly used for such analysis, variations between clusters may be driven or distorted by subject-level covariates, reflecting underlying biological and clinical heterogeneity across individuals. Simultaneously detecting latent communities and identifying covariates that differentiate them can enhance our understanding of the microbiome and its association with health outcomes. To this end, we propose a Dirichlet-multinomial mixture regression (DMMR) model that enables joint clustering of microbiome profiles while accounting for covariates with either homogeneous or heterogeneous effects across clusters. A novel symmetric link function is introduced to facilitate covariate modeling through the compositional parameters. We develop efficient algorithms with convergence guarantees for parameter estimation and establish theoretical properties of the proposed estimators. Extensive simulation studies demonstrate the effectiveness of the method in clustering, feature selection, and heterogeneity detection. We illustrate the utility of DMMR through a comprehensive application to upper-airway microbiota data from a pediatric asthma study, uncovering distinct microbial subtypes and their associations with clinical characteristics.
Similar Papers
A Bayesian Semiparametric Mixture Model for Clustering Zero-Inflated Microbiome Data
Methodology
Finds hidden groups in gut germs for health.
Microbial correlation: a semi-parametric model for investigating microbial co-metabolism
Methodology
Finds how gut germs work together to make health.
Uncertainty quantification for mixed membership in multilayer networks with degree heterogeneity using Gaussian variational inference
Methodology
Finds hidden groups in connected data.