Approximating High-Dimensional Earth Mover's Distance as Fast as Closest Pair
By: Lorenzo Beretta , Vincent Cohen-Addad , Rajesh Jayaram and more
Potential Business Impact:
Finds closest pairs faster, improving distance calculations.
We give a reduction from $(1+\varepsilon)$-approximate Earth Mover's Distance (EMD) to $(1+\varepsilon)$-approximate Closest Pair (CP). As a consequence, we improve the fastest known approximation algorithm for high-dimensional EMD. Here, given $p\in [1, 2]$ and two sets of $n$ points $X,Y \subseteq (\mathbb R^d,\ell_p)$, their EMD is the minimum cost of a perfect matching between $X$ and $Y$, where the cost of matching two vectors is their $\ell_p$ distance. Further, CP is the basic problem of finding a pair of points realizing $\min_{x \in X, y\in Y} ||x-y||_p$. Our contribution is twofold: we show that if a $(1+\varepsilon)$-approximate CP can be computed in time $n^{2-\phi}$, then a $1+O(\varepsilon)$ approximation to EMD can be computed in time $n^{2-\Omega(\phi)}$; plugging in the fastest known algorithm for CP [Alman, Chan, Williams FOCS'16], we obtain a $(1+\varepsilon)$-approximation algorithm for EMD running in time $n^{2-\tilde{\Omega}(\varepsilon^{1/3})}$ for high-dimensional point sets, which improves over the prior fastest running time of $n^{2-\Omega(\varepsilon^2)}$ [Andoni, Zhang FOCS'23]. Our main technical contribution is a sublinear implementation of the Multiplicative Weights Update framework for EMD. Specifically, we demonstrate that the updates can be executed without ever explicitly computing or storing the weights; instead, we exploit the underlying geometric structure to perform the updates implicitly.
Similar Papers
Additive Approximation Schemes for Low-Dimensional Embeddings
Data Structures and Algorithms
Finds simpler patterns in complex data.
Tight Pair Query Lower Bounds for Matching and Earth Mover's Distance
Data Structures and Algorithms
Finds best match in networks faster.
Range Counting Oracles for Geometric Problems
Computational Geometry
Helps find the best way to connect points.