Fast, Space-Optimal Streaming Algorithms for Clustering and Subspace Embeddings
By: Vincent Cohen-Addad , Liudeng Wang , David P. Woodruff and more
Potential Business Impact:
Organizes data faster, even with huge amounts.
We show that both clustering and subspace embeddings can be performed in the streaming model with the same asymptotic efficiency as in the central/offline setting. For $(k, z)$-clustering in the streaming model, we achieve a number of words of memory which is independent of the number $n$ of input points and the aspect ratio $\Delta$, yielding an optimal bound of $\tilde{\mathcal{O}}\left(\frac{dk}{\min(\varepsilon^4,\varepsilon^{z+2})}\right)$ words for accuracy parameter $\varepsilon$ on $d$-dimensional points. Additionally, we obtain amortized update time of $d\,\log(k)\cdot\text{polylog}(\log(n\Delta))$, which is an exponential improvement over the previous $d\,\text{poly}(k,\log(n\Delta))$. Our method also gives the fastest runtime for $(k,z)$-clustering even in the offline setting. For subspace embeddings in the streaming model, we achieve $\mathcal{O}(d)$ update time and space-optimal constructions, using $\tilde{\mathcal{O}}\left(\frac{d^2}{\varepsilon^2}\right)$ words for $p\le 2$ and $\tilde{\mathcal{O}}\left(\frac{d^{p/2+1}}{\varepsilon^2}\right)$ words for $p>2$, showing that streaming algorithms can match offline algorithms in both space and time complexity.
Similar Papers
Improved Streaming Algorithm for Fair $k$-Center Clustering
Data Structures and Algorithms
Helps computers group data fairly for everyone.
Guessing Efficiently for Constrained Subspace Approximation
Data Structures and Algorithms
Finds the best flat shape for scattered points.
Near-optimal Hypergraph Sparsification in Insertion-only and Bounded-deletion Streams
Data Structures and Algorithms
Simplifies complex network data with less memory.