Quality Degradation Attack in Synthetic Data
By: Qinyi Liu , Dong Liu , Farhad Vadiee and more
Potential Business Impact:
Makes fake data safer from bad guys.
Synthetic Data Generation (SDG) can be used to facilitate privacy-preserving data sharing. However, most existing research focuses on privacy attacks where the adversary is the recipient of the released synthetic data and attempts to infer sensitive information from it. This study investigates quality degradation attacks initiated by adversaries who possess access to the real dataset or control over the generation process, such as the data owner, the synthetic data provider, or potential intruders. We formalize a corresponding threat model and empirically evaluate the effectiveness of targeted manipulations of real data (e.g., label flipping and feature-importance-based interventions) on the quality of generated synthetic data. The results show that even small perturbations can substantially reduce downstream predictive performance and increase statistical divergence, exposing vulnerabilities within SDG pipelines. This study highlights the need to integrate integrity verification and robustness mechanisms, alongside privacy protection, to ensure the reliability and trustworthiness of synthetic data sharing frameworks.
Similar Papers
Empirical Evaluation of Structured Synthetic Data Privacy Metrics: Novel experimental framework
Cryptography and Security
Makes fake data safe for sharing.
Causal Synthetic Data Generation in Recruitment
Machine Learning (CS)
Creates fair job rankings without using real people's private info.
Leveraging Vertical Public-Private Split for Improved Synthetic Data Generation
Machine Learning (CS)
Makes private data useful without showing real info.