Near-optimal algorithms for private estimation and sequential testing of collision probability
By: Robert Busa-Fekete, Umar Syed
Potential Business Impact:
Measures how spread out data is, privately.
We present new algorithms for estimating and testing \emph{collision probability}, a fundamental measure of the spread of a discrete distribution that is widely used in many scientific fields. We describe an algorithm that satisfies $(\alpha, \beta)$-local differential privacy and estimates collision probability with error at most $\epsilon$ using $\tilde{O}\left(\frac{\log(1/\beta)}{\alpha^2 \epsilon^2}\right)$ samples for $\alpha \le 1$, which improves over previous work by a factor of $\frac{1}{\alpha^2}$. We also present a sequential testing algorithm for collision probability, which can distinguish between collision probability values that are separated by $\epsilon$ using $\tilde{O}(\frac{1}{\epsilon^2})$ samples, even when $\epsilon$ is unknown. Our algorithms have nearly the optimal sample complexity, and in experiments we show that they require significantly fewer samples than previous methods.
Similar Papers
Differential Privacy and Survey Sampling
Statistics Theory
Protects private data when counting people.
Approximate Algorithms for Verifying Differential Privacy with Gaussian Distributions
Cryptography and Security
Checks if private data stays secret.
Nearly-Linear Time Private Hypothesis Selection with the Optimal Approximation Factor
Data Structures and Algorithms
Finds the best data pattern while keeping it private.