SMOTE-DP: Improving Privacy-Utility Tradeoff with Synthetic Data
By: Yan Zhou, Bradley Malin, Murat Kantarcioglu
Potential Business Impact:
Makes private data useful without losing secrets.
Privacy-preserving data publication, including synthetic data sharing, often experiences trade-offs between privacy and utility. Synthetic data is generally more effective than data anonymization in balancing this trade-off, however, not without its own challenges. Synthetic data produced by generative models trained on source data may inadvertently reveal information about outliers. Techniques specifically designed for preserving privacy, such as introducing noise to satisfy differential privacy, often incur unpredictable and significant losses in utility. In this work we show that, with the right mechanism of synthetic data generation, we can achieve strong privacy protection without significant utility loss. Synthetic data generators producing contracting data patterns, such as Synthetic Minority Over-sampling Technique (SMOTE), can enhance a differentially private data generator, leveraging the strengths of both. We prove in theory and through empirical demonstration that this SMOTE-DP technique can produce synthetic data that not only ensures robust privacy protection but maintains utility in downstream learning tasks.
Similar Papers
DP-SMOTE: Integrating Differential Privacy and Oversampling Technique to Preserve Privacy in Smart Homes
Cryptography and Security
Keeps your smart home data private when shared.
SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling
Cryptography and Security
Exposes private information in fake data.
Optimizing the Privacy-Utility Balance using Synthetic Data and Configurable Perturbation Pipelines
Cryptography and Security
Makes private data safe for computer learning.