Silhouette-Guided Instance-Weighted k-means
By: Aggelos Semoglou, Aristidis Likas, John Pavlopoulos
Potential Business Impact:
Improves computer grouping by ignoring bad data.
Clustering is a fundamental unsupervised learning task with numerous applications across diverse fields. Popular algorithms such as k-means often struggle with outliers or imbalances, leading to distorted centroids and suboptimal partitions. We introduce K-Sil, a silhouette-guided refinement of the k-means algorithm that weights points based on their silhouette scores, prioritizing well-clustered instances while suppressing borderline or noisy regions. The algorithm emphasizes user-specified silhouette aggregation metrics: macro-, micro-averaged or a combination, through self-tuning weighting schemes, supported by appropriate sampling strategies and scalable approximations. These components ensure computational efficiency and adaptability to diverse dataset geometries. Theoretical guarantees establish centroid convergence, and empirical validation on synthetic and real-world datasets demonstrates statistically significant improvements in silhouette scores over k-means and two other instance-weighted k-means variants. These results establish K-Sil as a principled alternative for applications demanding high-quality, well-separated clusters.
Similar Papers
Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient
Machine Learning (CS)
Finds the best number of groups in data.
CAS Condensed and Accelerated Silhouette: An Efficient Method for Determining the Optimal K in K-Means Clustering
Machine Learning (CS)
Finds best groups in data much faster.
Anomaly Detection and Improvement of Clusters using Enhanced K-Means Algorithm
Machine Learning (CS)
Finds weird data points and groups similar data better.