Density estimation for compositional data using nonparametric mixtures
By: Jiajin Xie, Yong Wang, Eduardo García-Portugués
Potential Business Impact:
Helps computers understand data with zero values.
Compositional data, representing proportions constrained to the simplex, arise in diverse fields such as geosciences, ecology, genomics, and microbiome research. Existing nonparametric density estimation methods often rely on transformations, which may induce substantial bias near the simplex boundary. We propose a nonparametric mixture-based framework for density estimation on compositions. Nonparametric Dirichlet mixtures are employed to naturally accommodate boundary values, thereby avoiding the transformation or zero-replacement, while also identifying components supported on the boundary, providing reliable estimates for data with zero or near-zero values. Bandwidth selection and initialization schemes are addressed. For comparison, nonparametric Gaussian mixtures, coupled with log-ratio transformations, are also considered. Extensive simulations show that the proposed estimators outperform existing approaches. Three real data applications, including GDP data analysis, handwritten digit recognition, and skin detection, demonstrate the usefulness of nonparametric Dirichlet mixtures in practice.
Similar Papers
Density estimation with atoms, and functional estimation for mixed discrete-continuous data
Methodology
Helps computers understand mixed data better.
Dirichlet kernel density estimation for strongly mixing sequences on the simplex
Statistics Theory
Helps understand changing market shares over time.
Bayesian nonparametric modeling of mixed-type bounded data
Methodology
Helps understand mixed health data better.