Sparse Bayesian Partially Identified Models for Sequence Count Data
By: Won Gu, Francesca Chiaromonte, Justin D. Silverman
In genomics, differential abundance and expression analyses are complicated by the compositional nature of sequence count data, which reflect only relative-not absolute-abundances or expression levels. Many existing methods attempt to address this limitation through data normalizations, but we have shown that such approaches imply strong, often biologically implausible assumptions about total microbial load or total gene expression. Even modest violations of these assumptions can inflate Type I and Type II error rates to over 70%. Sparse estimators have been proposed as an alternative, leveraging the assumption that only a small subset of taxa (or genes) change between conditions. However, we show that current sparse methods suffer from similar pathologies because they treat sparsity assumptions as fixed and ignore the uncertainty inherent in these assumptions. We introduce a sparse Bayesian Partially Identified Model (PIM) that addresses this limitation by explicitly modeling uncertainty in sparsity assumptions. Our method extends the Scale-Reliant Inference (SRI) framework to the sparse setting, providing a principled approach to differential analysis under scale uncertainty. We establish theoretical consistency of the proposed estimator and, through extensive simulations and real data analyses, demonstrate substantial reductions in both Type I and Type II errors compared to existing methods.
Similar Papers
A Bayesian Semiparametric Mixture Model for Clustering Zero-Inflated Microbiome Data
Methodology
Finds hidden groups in gut germs for health.
A multi-stage Bayesian approach to fit spatial point process models
Methodology
Predicts seal numbers in hidden fjord areas fast
Sparse and nonparametric estimation of equations governing dynamical systems with applications to biology
Machine Learning (CS)
Finds hidden rules in nature's complex systems.