Score: 0

Dissecting Microbial Community Structure and Heterogeneity via Multivariate Covariate-Adjusted Clustering

Published: August 14, 2025 | arXiv ID: 2508.11036v1

By: Zhongmao Liu , Xiaohui Yin , Yanjiao Zhou and more

Potential Business Impact:

Finds gut bacteria groups linked to health.

In microbiome studies, it is often of great interest to identify clusters or partitions of microbiome profiles within a study population and to characterize the distinctive attributes of each resulting microbial community. While raw counts or relative compositions are commonly used for such analysis, variations between clusters may be driven or distorted by subject-level covariates, reflecting underlying biological and clinical heterogeneity across individuals. Simultaneously detecting latent communities and identifying covariates that differentiate them can enhance our understanding of the microbiome and its association with health outcomes. To this end, we propose a Dirichlet-multinomial mixture regression (DMMR) model that enables joint clustering of microbiome profiles while accounting for covariates with either homogeneous or heterogeneous effects across clusters. A novel symmetric link function is introduced to facilitate covariate modeling through the compositional parameters. We develop efficient algorithms with convergence guarantees for parameter estimation and establish theoretical properties of the proposed estimators. Extensive simulation studies demonstrate the effectiveness of the method in clustering, feature selection, and heterogeneity detection. We illustrate the utility of DMMR through a comprehensive application to upper-airway microbiota data from a pediatric asthma study, uncovering distinct microbial subtypes and their associations with clinical characteristics.

Country of Origin
🇺🇸 United States

Page Count
52 pages

Category
Statistics:
Methodology