Score: 2

Near-optimal algorithms for private estimation and sequential testing of collision probability

Published: April 18, 2025 | arXiv ID: 2504.13804v1

By: Robert Busa-Fekete, Umar Syed

BigTech Affiliations: Google

Potential Business Impact:

Measures how spread out data is, privately.

Business Areas:
A/B Testing Data and Analytics

We present new algorithms for estimating and testing \emph{collision probability}, a fundamental measure of the spread of a discrete distribution that is widely used in many scientific fields. We describe an algorithm that satisfies $(\alpha, \beta)$-local differential privacy and estimates collision probability with error at most $\epsilon$ using $\tilde{O}\left(\frac{\log(1/\beta)}{\alpha^2 \epsilon^2}\right)$ samples for $\alpha \le 1$, which improves over previous work by a factor of $\frac{1}{\alpha^2}$. We also present a sequential testing algorithm for collision probability, which can distinguish between collision probability values that are separated by $\epsilon$ using $\tilde{O}(\frac{1}{\epsilon^2})$ samples, even when $\epsilon$ is unknown. Our algorithms have nearly the optimal sample complexity, and in experiments we show that they require significantly fewer samples than previous methods.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
25 pages

Category
Statistics:
Machine Learning (Stat)