Perfect $L_p$ Sampling with Polylogarithmic Update Time
By: William Swartworth, David P. Woodruff, Samson Zhou
Potential Business Impact:
Lets computers track data changes faster.
Perfect $L_p$ sampling in a stream was introduced by Jayaram and Woodruff (FOCS 2018) as a streaming primitive which, given turnstile updates to a vector $x \in \{-\text{poly}(n), \ldots, \text{poly}(n)\}^n$, outputs an index $i^* \in \{1, 2, \ldots, n\}$ such that the probability of returning index $i$ is exactly \[\Pr[i^* = i] = \frac{|x_i|^p}{\|x\|_p^p} \pm \frac{1}{n^C},\] where $C > 0$ is an arbitrarily large constant. Jayaram and Woodruff achieved the optimal $\tilde{O}(\log^2 n)$ bits of memory for $0 < p < 2$, but their update time is at least $n^C$ per stream update. Thus an important open question is to achieve efficient update time while maintaining optimal space. For $0 < p < 2$, we give the first perfect $L_p$-sampler with the same optimal amount of memory but with only $\text{poly}(\log n)$ update time. Crucial to our result is an efficient simulation of a sum of reciprocals of powers of truncated exponential random variables by approximating its characteristic function, using the Gil-Pelaez inversion formula, and applying variants of the trapezoid formula to quickly approximate it.
Similar Papers
Perfect Sampling in Turnstile Streams Beyond Small Moments
Data Structures and Algorithms
Finds important data in huge streams of numbers.
$L_p$ Sampling in Distributed Data Streams with Applications to Adversarial Robustness
Data Structures and Algorithms
Helps servers share data perfectly and securely.
Unbiased Insights: Optimal Streaming Algorithms for $\ell_p$ Sampling, the Forget Model, and Beyond
Data Structures and Algorithms
Finds patterns in huge data streams using less space.