A Bayesian Semiparametric Mixture Model for Clustering Zero-Inflated Microbiome Data
By: Suppapat Korsurat, Matthew D. Koslovsky
Potential Business Impact:
Finds hidden groups in gut germs for health.
Microbiome research has immense potential for unlocking insights into human health and disease. A common goal in human microbiome research is identifying subgroups of individuals with similar microbial composition that may be linked to specific health states or environmental exposures. However, existing clustering methods are often not equipped to accommodate the complex structure of microbiome data and typically make limiting assumptions regarding the number of clusters in the data which can bias inference. Designed for zero-inflated multivariate compositional count data collected in microbiome research, we propose a novel Bayesian semiparametric mixture modeling framework that simultaneously learns the number of clusters in the data while performing cluster allocation. In simulation, we demonstrate the clustering performance of our method compared to distance- and model-based alternatives and the importance of accommodating zero-inflation when present in the data. We then apply the model to identify clusters in microbiome data collected in a study designed to investigate the relation between gut microbial composition and enteric diarrheal disease.
Similar Papers
A stochastic method to estimate a zero-inflated two-part mixed model for human microbiome data
Methodology
Tracks tiny gut bugs changing with health.
Truncated Gaussian copula principal component analysis with application to pediatric acute lymphoblastic leukemia patients' gut microbiome
Methodology
Finds gut bugs that predict cancer patient infections.
Dissecting Microbial Community Structure and Heterogeneity via Multivariate Covariate-Adjusted Clustering
Methodology
Finds gut bacteria groups linked to health.