Score: 1

Density estimation for compositional data using nonparametric mixtures

Published: October 8, 2025 | arXiv ID: 2510.07608v1

By: Jiajin Xie, Yong Wang, Eduardo García-Portugués

Potential Business Impact:

Helps computers understand data with zero values.

Business Areas:
Analytics Data and Analytics

Compositional data, representing proportions constrained to the simplex, arise in diverse fields such as geosciences, ecology, genomics, and microbiome research. Existing nonparametric density estimation methods often rely on transformations, which may induce substantial bias near the simplex boundary. We propose a nonparametric mixture-based framework for density estimation on compositions. Nonparametric Dirichlet mixtures are employed to naturally accommodate boundary values, thereby avoiding the transformation or zero-replacement, while also identifying components supported on the boundary, providing reliable estimates for data with zero or near-zero values. Bandwidth selection and initialization schemes are addressed. For comparison, nonparametric Gaussian mixtures, coupled with log-ratio transformations, are also considered. Extensive simulations show that the proposed estimators outperform existing approaches. Three real data applications, including GDP data analysis, handwritten digit recognition, and skin detection, demonstrate the usefulness of nonparametric Dirichlet mixtures in practice.

Country of Origin
🇳🇿 New Zealand

Repos / Data Links

Page Count
26 pages

Category
Statistics:
Methodology