Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures
By: Alina Ene , Alessandro Epasto , Vahab Mirrokni and more
Potential Business Impact:
Finds best groups to cover most things.
In the maximum coverage problem we are given $d$ subsets from a universe $[n]$, and the goal is to output $k$ subsets such that their union covers the largest possible number of distinct items. We present the first algorithm for maximum coverage in the turnstile streaming model, where updates which insert or delete an item from a subset come one-by-one. Notably our algorithm only uses $poly\log n$ update time. We also present turnstile streaming algorithms for targeted and general fingerprinting for risk management where the goal is to determine which features pose the greatest re-identification risk in a dataset. As part of our work, we give a result of independent interest: an algorithm to estimate the complement of the $p^{\text{th}}$ frequency moment of a vector for $p \geq 2$. Empirical evaluation confirms the practicality of our fingerprinting algorithms demonstrating a speedup of up to $210$x over prior work.
Similar Papers
Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model
Data Structures and Algorithms
Counts unique items privately with less memory.
Subtrajectory Clustering and Coverage Maximization in Cubic Time, or Better
Computational Geometry
Finds patterns in movement data using math.
Perfect Sampling in Turnstile Streams Beyond Small Moments
Data Structures and Algorithms
Finds important data in huge streams of numbers.