Bayesian nonparametric models for zero-inflated count-compositional data using ensembles of regression trees
By: André F. B. Menezes, Andrew C. Parnell, Keefe Murphy
Count-compositional data arise in many different fields, including high-throughput microbiome sequencing and palynology experiments, where a common, important goal is to understand how covariates relate to the observed compositions. Existing methods often fail to simultaneously address key challenges inherent in such data, namely: overdispersion, an excess of zeros, cross-sample heterogeneity, and nonlinear covariate effects. To address these concerns, we propose novel Bayesian models based on ensembles of regression trees. Specifically, we leverage the recently introduced zero-and-$N$-inflated multinomial distribution and assign independent nonparametric Bayesian additive regression tree (BART) priors to both the compositional and structural zero probability components of our model, to flexibly capture covariate effects. We further extend this by adding latent random effects to capture overdispersion and more general dependence structures among the categories. We develop an efficient inferential algorithm combining recent data augmentation schemes with established BART sampling routines. We evaluate our proposed models in simulation studies and illustrate their applicability with two case studies in microbiome and palaeoclimate modelling.
Similar Papers
Flexible model for varying levels of zeros and outliers in count data
Methodology
Better counts for tricky data with many zeros.
A Bayesian Semiparametric Mixture Model for Clustering Zero-Inflated Microbiome Data
Methodology
Finds hidden groups in gut germs for health.
Conditional Copula models using loss-based Bayesian Additive Regression Trees
Methodology
Shows how things are connected, even when they change.