Score: 0

Spectral Clustering in Birthday Paradox Time

Published: January 9, 2026 | arXiv ID: 2601.05883v1

By: Michael Kapralov, Ekaterina Kochetkova, Weronika Wrzos-Kaminska

Potential Business Impact:

Finds which group a computer network part belongs to.

Business Areas:
A/B Testing Data and Analytics

Given a vertex in a $(k, \varphi, ε)$-clusterable graph, i.e. a graph whose vertex set can be partitioned into a disjoint union of $\varphi$-expanders of size $\approx n/k$ with outer conductance bounded by $ε$, can one quickly tell which cluster it belongs to? This question goes back to the expansion testing problem of Goldreich and Ron'11. For $k=2$ a sample of $\approx n^{1/2+O(ε/\varphi^2)}$ logarithmic length walks from a given vertex approximately determines its cluster membership by the birthday paradox: two vertices whose random walk samples are `close' are likely in the same cluster. The study of the general case $k>2$ was initiated by Czumaj, Peng and Sohler [STOC'15], and the works of Chiplunkar et al. [FOCS'18], Gluch et al. [SODA'21] showed that $\approx \text{poly}(k)\cdot n^{1/2+O(ε/\varphi^2)}$ random walk samples suffice for general $k$. This matches the $k=2$ result up to polynomial factors in $k$, but creates a conceptual inconsistency: if the birthday paradox is the guiding phenomenon, then the query complexity should decrease with the number of clusters $k$! Since clusters have size $\approx n/k$, we expect to need $\approx (n/k)^{1/2+O(ε/\varphi^2)}$ random walk samples, which decreases with $k$. We design a novel representation of vertices in a $(k, \varphi, ε)$-clusterable graph by a mixture of logarithmic length walks. This representation uses the optimal $\approx (n/k)^{1/2+O(ε/\varphi^2)}$ walks per vertex, and allows for a fast nearest neighbor search: given $k$ vertices representing the clusters, we can find the cluster of a given query vertex $x$ using nearly linear time in the representation size of $x$. This gives a clustering oracle with query time $\approx (n/k)^{1/2+O(ε/\varphi^2)}$ and space complexity $k\cdot (n/k)^{1/2+O(ε/\varphi^2)}$, matching the birthday paradox bound.

Page Count
85 pages

Category
Computer Science:
Data Structures and Algorithms