The Adverse Effects of Omitting Records in Differential Privacy: How Sampling and Suppression Degrade the Privacy-Utility Tradeoff (Long Version)
By: Àlex Miranda-Pascual, Javier Parra-Arnau, Thorsten Strufe
Potential Business Impact:
Sampling makes private data less useful.
Sampling is renowned for its privacy amplification in differential privacy (DP), and is often assumed to improve the utility of a DP mechanism by allowing a noise reduction. In this paper, we further show that this last assumption is flawed: When measuring utility at equal privacy levels, sampling as preprocessing consistently yields penalties due to utility loss from omitting records over all canonical DP mechanisms -- Laplace, Gaussian, exponential, and report noisy max -- as well as recent applications of sampling, such as clustering. Extending this analysis, we investigate suppression as a generalized method of choosing, or omitting, records. Developing a theoretical analysis of this technique, we derive privacy bounds for arbitrary suppression strategies under unbounded approximate DP. We find that our tested suppression strategy also fails to improve the privacy-utility tradeoff. Surprisingly, uniform sampling emerges as one of the best suppression methods -- despite its still degrading effect. Our results call into question common preprocessing assumptions in DP practice.
Similar Papers
Improving Statistical Privacy by Subsampling
Cryptography and Security
Protects secrets by adding random noise to data.
Statistical Privacy
Cryptography and Security
Protects your data even if hackers know how it's made.
SMOTE-DP: Improving Privacy-Utility Tradeoff with Synthetic Data
Machine Learning (CS)
Makes private data useful without losing secrets.