Score: 0

A novel k-means clustering approach using two distance measures for Gaussian data

Published: November 21, 2025 | arXiv ID: 2511.17823v1

By: Naitik Gada

Potential Business Impact:

Finds hidden patterns in messy information better.

Business Areas:

Test and Measurement Data and Analytics

Clustering algorithms have long been the topic of research, representing the more popular side of unsupervised learning. Since clustering analysis is one of the best ways to find some clarity and structure within raw data, this paper explores a novel approach to \textit{k}-means clustering. Here we present a \textit{k}-means clustering algorithm that takes both the within cluster distance (WCD) and the inter cluster distance (ICD) as the distance metric to cluster the data into \emph{k} clusters pre-determined by the Calinski-Harabasz criterion in order to provide a more robust output for the clustering analysis. The idea with this approach is that by including both the measurement metrics, the convergence of the data into their clusters becomes solidified and more robust. We run the algorithm with some synthetically produced data and also some benchmark data sets obtained from the UCI repository. The results show that the convergence of the data into their respective clusters is more accurate by using both WCD and ICD measurement metrics. The algorithm is also better at clustering the outliers into their true clusters as opposed to the traditional \textit{k} means method. We also address some interesting possible research topics that reveal themselves as we answer the questions we initially set out to address.

Beyond I-Con: Exploring New Dimension of Distance Measures in Representation Learning

Machine Learning (CS)

Finds better ways for computers to learn.

5 Sep 2025 2

87%

Clustering Approaches for Mixed-Type Data: A Comparative Study

Machine Learning (Stat)

Finds patterns in mixed-type data.

24 Nov 2025 1

86%

High-Dimensional BWDM: A Robust Nonparametric Clustering Validation Index for Large-Scale Data

Machine Learning (Stat)

Finds best groups in messy, big data.

15 Oct 2025 0

View PDF Login to Bookmark

Page Count

36 pages

A novel k-means clustering approach using two distance measures for Gaussian data

Finds hidden patterns in messy information better.

Technical Abstract

Beyond I-Con: Exploring New Dimension of Distance Measures in Representation Learning

Clustering Approaches for Mixed-Type Data: A Comparative Study

High-Dimensional BWDM: A Robust Nonparametric Clustering Validation Index for Large-Scale Data