Score: 1

Silhouette-Guided Instance-Weighted k-means

Published: June 15, 2025 | arXiv ID: 2506.12878v1

By: Aggelos Semoglou, Aristidis Likas, John Pavlopoulos

Potential Business Impact:

Improves computer grouping by ignoring bad data.

Business Areas:
Image Recognition Data and Analytics, Software

Clustering is a fundamental unsupervised learning task with numerous applications across diverse fields. Popular algorithms such as k-means often struggle with outliers or imbalances, leading to distorted centroids and suboptimal partitions. We introduce K-Sil, a silhouette-guided refinement of the k-means algorithm that weights points based on their silhouette scores, prioritizing well-clustered instances while suppressing borderline or noisy regions. The algorithm emphasizes user-specified silhouette aggregation metrics: macro-, micro-averaged or a combination, through self-tuning weighting schemes, supported by appropriate sampling strategies and scalable approximations. These components ensure computational efficiency and adaptability to diverse dataset geometries. Theoretical guarantees establish centroid convergence, and empirical validation on synthetic and real-world datasets demonstrates statistically significant improvements in silhouette scores over k-means and two other instance-weighted k-means variants. These results establish K-Sil as a principled alternative for applications demanding high-quality, well-separated clusters.

Country of Origin
🇬🇷 Greece

Repos / Data Links

Page Count
27 pages

Category
Computer Science:
Machine Learning (CS)