On Tight Robust Coresets for $k$-Medians Clustering
By: Lingxiao Huang , Zhenyu Jiang , Yi Li and more
Potential Business Impact:
Finds bad data points to make computer groups better.
This paper considers coresets for the robust $k$-medians problem with $m$ outliers, and new constructions in various metric spaces are obtained. Specifically, for metric spaces with a bounded VC or doubling dimension $d$, the coreset size is $O(m) + \tilde{O}(kd\varepsilon^{-2})$, which is optimal up to logarithmic factors. For Euclidean spaces, the coreset size is $O(m\varepsilon^{-1}) + \tilde{O}(\min\{k^{4/3}\varepsilon^{-2},k\varepsilon^{-3}\})$, improving upon a recent result by Jiang and Lou (ICALP 2025). These results also extend to robust $(k,z)$-clustering, yielding, for VC and doubling dimension, a coreset size of $O(m) + \tilde{O}(kd\varepsilon^{-2z})$ with the optimal linear dependence on $m$. This extended result improves upon the earlier work of Huang et al. (SODA 2025). The techniques introduce novel dataset decompositions, enabling chaining arguments to be applied jointly across multiple components.
Similar Papers
Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers
Data Structures and Algorithms
Makes data summaries better by ignoring bad data.
Linear time small coresets for k-mean clustering of segments with applications
Machine Learning (CS)
Groups shapes in videos faster and more accurately.
Linear time small coresets for k-mean clustering of segments with applications
Machine Learning (CS)
Groups shapes in data much faster.