New Money: A Systematic Review of Synthetic Data Generation for Finance
By: James Meldrum , Basem Suleiman , Fethi Rabhi and more
Potential Business Impact:
Creates fake money data to train computers safely.
Synthetic data generation has emerged as a promising approach to address the challenges of using sensitive financial data in machine learning applications. By leveraging generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), it is possible to create artificial datasets that preserve the statistical properties of real financial records while mitigating privacy risks and regulatory constraints. Despite the rapid growth of this field, a comprehensive synthesis of the current research landscape has been lacking. This systematic review consolidates and analyses 72 studies published since 2018 that focus on synthetic financial data generation. We categorise the types of financial information synthesised, the generative methods employed, and the evaluation strategies used to assess data utility and privacy. The findings indicate that GAN-based approaches dominate the literature, particularly for generating time-series market data and tabular credit data. While several innovative techniques demonstrate potential for improved realism and privacy preservation, there remains a notable lack of rigorous evaluation of privacy safeguards across studies. By providing an integrated overview of generative techniques, applications, and evaluation methods, this review highlights critical research gaps and offers guidance for future work aimed at developing robust, privacy-preserving synthetic data solutions for the financial domain.
Similar Papers
Generative Models for Synthetic Data: Transforming Data Mining in the GenAI Era
Machine Learning (CS)
Creates fake data to train computers faster.
Evaluating Differentially Private Generation of Domain-Specific Text
Machine Learning (CS)
Creates fake data that keeps real secrets safe.
Opinion: Revisiting synthetic data classifications from a privacy perspective
Machine Learning (CS)
Creates fake data that protects privacy for AI.