Disjoint Generative Models
By: Anton Danholt Lautrup , Muhammad Rajabinasab , Tobias Hyrup and more
Potential Business Impact:
Makes private data for computers without sharing secrets.
We propose a new framework for generating cross-sectional synthetic datasets via disjoint generative models. In this paradigm, a dataset is partitioned into disjoint subsets that are supplied to separate instances of generative models. The results are then combined post hoc by a joining operation that works in the absence of common variables/identifiers. The success of the framework is demonstrated through several case studies and examples on tabular data that helps illuminate some of the design choices that one may make. The principal benefit of disjoint generative models is significantly increased privacy at only a low utility cost. Additional findings include increased effectiveness and feasibility for certain model types and the possibility for mixed-model synthesis.
Similar Papers
Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model Validation
Machine Learning (CS)
Makes medical AI work everywhere, fairly.
Assessing Generative Models for Structured Data
Machine Learning (CS)
Makes fake data that looks like real data.
A Comprehensive Survey of Synthetic Tabular Data Generation
Machine Learning (CS)
Creates fake data for computers to learn from.