Score: 0

CAS Condensed and Accelerated Silhouette: An Efficient Method for Determining the Optimal K in K-Means Clustering

Published: July 11, 2025 | arXiv ID: 2507.08311v1

By: Krishnendu Das, Sumit Gupta, Awadhesh Kumar

Potential Business Impact:

Finds best groups in data much faster.

Business Areas:

Image Recognition Data and Analytics, Software

Clustering is a critical component of decision-making in todays data-driven environments. It has been widely used in a variety of fields such as bioinformatics, social network analysis, and image processing. However, clustering accuracy remains a major challenge in large datasets. This paper presents a comprehensive overview of strategies for selecting the optimal value of k in clustering, with a focus on achieving a balance between clustering precision and computational efficiency in complex data environments. In addition, this paper introduces improvements to clustering techniques for text and image data to provide insights into better computational performance and cluster validity. The proposed approach is based on the Condensed Silhouette method, along with statistical methods such as Local Structures, Gap Statistics, Class Consistency Ratio, and a Cluster Overlap Index CCR and COIbased algorithm to calculate the best value of k for K-Means clustering. The results of comparative experiments show that the proposed approach achieves up to 99 percent faster execution times on high-dimensional datasets while retaining both precision and scalability, making it highly suitable for real time clustering needs or scenarios demanding efficient clustering with minimal resource utilization.

Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient

Machine Learning (CS)

Finds the best number of groups in data.

26 Jan 2025 0

89%

Silhouette-Guided Instance-Weighted k-means

Machine Learning (CS)

Improves computer grouping by ignoring bad data.

15 Jun 2025 1

87%

Scalable Parameter-Light Spectral Method for Clustering Short Text Embeddings with a Cohesion-Based Evaluation Metric

Machine Learning (CS)

Finds hidden groups in text without guessing.

24 Nov 2025 0

View PDF Login to Bookmark

Page Count

13 pages

CAS Condensed and Accelerated Silhouette: An Efficient Method for Determining the Optimal K in K-Means Clustering

Finds best groups in data much faster.

Technical Abstract

Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient

Silhouette-Guided Instance-Weighted k-means

Scalable Parameter-Light Spectral Method for Clustering Short Text Embeddings with a Cohesion-Based Evaluation Metric