Anomaly Detection and Improvement of Clusters using Enhanced K-Means Algorithm
By: Vardhan Shorewala, Shivam Shorewala
Potential Business Impact:
Finds weird data points and groups similar data better.
This paper introduces a unified approach to cluster refinement and anomaly detection in datasets. We propose a novel algorithm that iteratively reduces the intra-cluster variance of N clusters until a global minimum is reached, yielding tighter clusters than the standard k-means algorithm. We evaluate the method using intrinsic measures for unsupervised learning, including the silhouette coefficient, Calinski-Harabasz index, and Davies-Bouldin index, and extend it to anomaly detection by identifying points whose assignment causes a significant variance increase. External validation on synthetic data and the UCI Breast Cancer and UCI Wine Quality datasets employs the Jaccard similarity score, V-measure, and F1 score. Results show variance reductions of 18.7% and 88.1% on the synthetic and Wine Quality datasets, respectively, along with accuracy and F1 score improvements of 22.5% and 20.8% on the Wine Quality dataset.
Similar Papers
A Computational Approach to Improving Fairness in K-means Clustering
Machine Learning (CS)
Makes computer groups fairer for everyone.
Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient
Machine Learning (CS)
Finds the best number of groups in data.
Modified K-means Algorithm with Local Optimality Guarantees
Machine Learning (CS)
Makes computer groups more accurate and reliable.