Score: 0

Mass Distribution versus Density Distribution in the Context of Clustering

Published: January 14, 2026 | arXiv ID: 2601.10759v1

By: Kai Ming Ting , Ye Zhu , Hang Zhang and more

Potential Business Impact:

Finds groups in data without favoring dense ones.

Business Areas:
Big Data Data and Analytics

This paper investigates two fundamental descriptors of data, i.e., density distribution versus mass distribution, in the context of clustering. Density distribution has been the de facto descriptor of data distribution since the introduction of statistics. We show that density distribution has its fundamental limitation -- high-density bias, irrespective of the algorithms used to perform clustering. Existing density-based clustering algorithms have employed different algorithmic means to counter the effect of the high-density bias with some success, but the fundamental limitation of using density distribution remains an obstacle to discovering clusters of arbitrary shapes, sizes and densities. Using the mass distribution as a better foundation, we propose a new algorithm which maximizes the total mass of all clusters, called mass-maximization clustering (MMC). The algorithm can be easily changed to maximize the total density of all clusters in order to examine the fundamental limitation of using density distribution versus mass distribution. The key advantage of the MMC over the density-maximization clustering is that the maximization is conducted without a bias towards dense clusters.

Country of Origin
🇨🇳 China

Page Count
54 pages

Category
Statistics:
Machine Learning (Stat)