Empirical Evaluation of Structured Synthetic Data Privacy Metrics: Novel experimental framework
By: Milton Nicolás Plasencia Palacios , Alexander Boudewijn , Sebastiano Saccani and more
Potential Business Impact:
Makes fake data safe for sharing.
Synthetic data generation is gaining traction as a privacy enhancing technology (PET). When properly generated, synthetic data preserve the analytic utility of real data while avoiding the retention of information that would allow the identification of specific individuals. However, the concept of data privacy remains elusive, making it challenging for practitioners to evaluate and benchmark the degree of privacy protection offered by synthetic data. In this paper, we propose a framework to empirically assess the efficacy of tabular synthetic data privacy quantification methods through controlled, deliberate risk insertion. To demonstrate this framework, we survey existing approaches to synthetic data privacy quantification and the related legal theory. We then apply the framework to the main privacy quantification methods with no-box threat models on publicly available datasets.
Similar Papers
A Consensus Privacy Metrics Framework for Synthetic Data
Cryptography and Security
Protects private information when sharing computer-made data.
Generating Synthetic Data with Formal Privacy Guarantees: State of the Art and the Road Ahead
Cryptography and Security
Creates fake data that keeps real secrets safe.
How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy
Cryptography and Security
Creates fake data that protects real people's secrets.