Convex Clustering Redefined: Robust Learning with the Median of Means Estimator
By: Sourav De , Koustav Chowdhury , Bibhabasu Mandal and more
Potential Business Impact:
Finds hidden groups in messy data without guessing.
Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all of them require the number of clusters k to be supplied as input, and many are notably sensitive to initialization. Convex clustering provides a more stable alternative by formulating the clustering task as a convex optimization problem, ensuring a unique global solution. However, it faces challenges in handling high-dimensional data, especially in the presence of noise and outliers. Additionally, strong fusion regularization, controlled by the tuning parameter, can hinder effective cluster formation within a convex clustering framework. To overcome these challenges, we introduce a robust approach that integrates convex clustering with the Median of Means (MoM) estimator, thus developing an outlier-resistant and efficient clustering framework that does not necessitate prior knowledge of the number of clusters. By leveraging the robustness of MoM alongside the stability of convex clustering, our method enhances both performance and efficiency, especially on large-scale datasets. Theoretical analysis demonstrates weak consistency under specific conditions, while experiments on synthetic and real-world datasets validate the method's superior performance compared to existing approaches.
Similar Papers
Uniform Mean Estimation for Heavy-Tailed Distributions via Median-of-Means
Machine Learning (Stat)
Finds averages in tricky data better.
On the Optimality of the Median-of-Means Estimator under Adversarial Contamination
Machine Learning (Stat)
Protects computer guesses from bad data.
A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights
Machine Learning (Stat)
Groups messy data into clear patterns.