Improving Statistical Privacy by Subsampling
By: Dennis Breutigam, Rüdiger Reischuk
Potential Business Impact:
Protects secrets by adding random noise to data.
Differential privacy (DP) considers a scenario, where an adversary has almost complete information about the entries of a database This worst-case assumption is likely to overestimate the privacy thread for an individual in real life. Statistical privacy (SP) denotes a setting where only the distribution of the database entries is known to an adversary, but not their exact values. In this case one has to analyze the interaction between noiseless privacy based on the entropy of distributions and privacy mechanisms that distort the answers of queries, which can be quite complex. A privacy mechanism often used is to take samples of the data for answering a query. This paper proves precise bounds how much different methods of sampling increase privacy in the statistical setting with respect to database size and sampling rate. They allow us to deduce when and how much sampling provides an improvement and how far this depends on the privacy parameter {\epsilon}. To perform these investigations we develop a framework to model sampling techniques. For the DP setting tradeoff functions have been proposed as a finer measure for privacy compared to ({\epsilon},{\delta})-pairs. We apply these tools to statistical privacy with subsampling to get a comparable characterization
Similar Papers
Differential Privacy and Survey Sampling
Statistics Theory
Protects private data when counting people.
Particle Filter for Bayesian Inference on Privatized Data
Computation
Keeps secrets safe while still learning from data.
Differential privacy from axioms
Data Structures and Algorithms
Guarantees privacy even when attackers know almost everything.