A Consensus Privacy Metrics Framework for Synthetic Data
By: Lisa Pilgram , Fida K. Dankar , Jorg Drechsler and more
Potential Business Impact:
Protects private information when sharing computer-made data.
Synthetic data generation is one approach for sharing individual-level data. However, to meet legislative requirements, it is necessary to demonstrate that the individuals' privacy is adequately protected. There is no consolidated standard for measuring privacy in synthetic data. Through an expert panel and consensus process, we developed a framework for evaluating privacy in synthetic data. Our findings indicate that current similarity metrics fail to measure identity disclosure, and their use is discouraged. For differentially private synthetic data, a privacy budget other than close to zero was not considered interpretable. There was consensus on the importance of membership and attribute disclosure, both of which involve inferring personal information about an individual without necessarily revealing their identity. The resultant framework provides precise recommendations for metrics that address these types of disclosures effectively. Our findings further present specific opportunities for future research that can help with widespread adoption of synthetic data.
Similar Papers
Generating Synthetic Data with Formal Privacy Guarantees: State of the Art and the Road Ahead
Cryptography and Security
Creates fake data that keeps real secrets safe.
The DCR Delusion: Measuring the Privacy Risk of Synthetic Data
Cryptography and Security
Makes fake data safer by finding hidden secrets.
The Data Sharing Paradox of Synthetic Data in Healthcare
Databases
Makes private health data safe for sharing.