Score: 1

Anomaly Detection and Improvement of Clusters using Enhanced K-Means Algorithm

Published: May 30, 2025 | arXiv ID: 2505.24365v1

By: Vardhan Shorewala, Shivam Shorewala

BigTech Affiliations: University of California, Berkeley

Potential Business Impact:

Finds weird data points and groups similar data better.

Business Areas:

Predictive Analytics Artificial Intelligence, Data and Analytics, Software

This paper introduces a unified approach to cluster refinement and anomaly detection in datasets. We propose a novel algorithm that iteratively reduces the intra-cluster variance of N clusters until a global minimum is reached, yielding tighter clusters than the standard k-means algorithm. We evaluate the method using intrinsic measures for unsupervised learning, including the silhouette coefficient, Calinski-Harabasz index, and Davies-Bouldin index, and extend it to anomaly detection by identifying points whose assignment causes a significant variance increase. External validation on synthetic data and the UCI Breast Cancer and UCI Wine Quality datasets employs the Jaccard similarity score, V-measure, and F1 score. Results show variance reductions of 18.7% and 88.1% on the synthetic and Wine Quality datasets, respectively, along with accuracy and F1 score improvements of 22.5% and 20.8% on the Wine Quality dataset.