New Algorithmic Directions in Optimal Transport and Applications for Product Spaces
By: Salman Beigi , Omid Etesami , Mohammad Mahmoody and more
Potential Business Impact:
Finds similar data points in huge datasets quickly.
We study optimal transport between two high-dimensional distributions $\mu,\nu$ in $R^n$ from an algorithmic perspective: given $x \sim \mu$, find a close $y \sim \nu$ in $poly(n)$ time, where $n$ is the dimension of $x,y$. Thus, running time depends on the dimension rather than the full representation size of $\mu,\nu$. Our main result is a general algorithm for transporting any product distribution $\mu$ to any $\nu$ with cost $\Delta + \delta$ under $\ell_p^p$, where $\Delta$ is the Knothe-Rosenblatt transport cost and $\delta$ is a computational error decreasing with runtime. This requires $\nu$ to be "sequentially samplable" with bounded average sampling cost, a new but natural notion. We further prove: An algorithmic version of Talagrand's inequality for transporting the standard Gaussian $\Phi^n$ to arbitrary $\nu$ under squared Euclidean cost. For $\nu = \Phi^n$ conditioned on a set $\mathcal{S}$ of measure $\varepsilon$, we construct the sequential sampler in expected time $poly(n/\varepsilon)$ using membership oracle access to $\mathcal{S}$. This yields an algorithmic transport from $\Phi^n$ to $\Phi^n|\mathcal{S}$ in $poly(n/\varepsilon)$ time and expected squared distance $O(\log 1/\varepsilon)$, optimal for general $\mathcal{S}$ of measure $\varepsilon$. As corollary, we obtain the first computational concentration result (Etesami et al. SODA 2020) for Gaussian measure under Euclidean distance with dimension-independent transportation cost, resolving an open question of Etesami et al. Specifically, for any $\mathcal{S}$ of Gaussian measure $\varepsilon$, most $\Phi^n$ samples can be mapped to $\mathcal{S}$ within distance $O(\sqrt{\log 1/\varepsilon})$ in $poly(n/\varepsilon)$ time.
Similar Papers
General Divergence Regularized Optimal Transport: Sample Complexity and Central Limit Theorems
Statistics Theory
Makes complex math work better in big problems.
Unidimensional semi-discrete partial optimal transport
Optimization and Control
Moves shapes more efficiently for computers.
Distributional Shrinkage II: Optimal Transport Denoisers with Higher-Order Scores
Statistics Theory
Cleans up messy signals using math.