Privacy Auditing Synthetic Data Release through Local Likelihood Attacks
By: Joshua Ward, Chi-Hua Wang, Guang Cheng
Potential Business Impact:
Finds hidden private info in fake data.
Auditing the privacy leakage of synthetic data is an important but unresolved problem. Most existing privacy auditing frameworks for synthetic data rely on heuristics and unreasonable assumptions to attack the failure modes of generative models, exhibiting limited capability to describe and detect the privacy exposure of training data through synthetic data release. In this paper, we study designing Membership Inference Attacks (MIAs) that specifically exploit the observation that tabular generative models tend to significantly overfit to certain regions of the training distribution. Here, we propose Generative Likelihood Ratio Attack (Gen-LRA), a novel, computationally efficient No-Box MIA that, with no assumption of model knowledge or access, formulates its attack by evaluating the influence a test observation has in a surrogate model's estimation of a local likelihood ratio over the synthetic data. Assessed over a comprehensive benchmark spanning diverse datasets, model architectures, and attack parameters, we find that Gen-LRA consistently dominates other MIAs for generative models across multiple performance metrics. These results underscore Gen-LRA's effectiveness as a privacy auditing tool for the release of synthetic data, highlighting the significant privacy risks posed by generative model overfitting in real-world applications.
Similar Papers
Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis
Cryptography and Security
Finds hidden secrets in fake data.
When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation
Machine Learning (CS)
AI makes fake data that accidentally reveals real secrets.
Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
Machine Learning (CS)
Finds best ways to check if AI learned private info.