Score: 0

Generating Synthetic Relational Tabular Data via Structural Causal Models

Published: July 4, 2025 | arXiv ID: 2507.03528v1

By: Frederik Hoppe , Astrid Franz , Lars Kleinemeier and more

Potential Business Impact:

Creates realistic fake data from linked tables.

Business Areas:
Simulation Software

Synthetic tabular data generation has received increasing attention in recent years, particularly with the emergence of foundation models for tabular data. The breakthrough success of TabPFN (Hollmann et al.,2025), which leverages vast quantities of synthetic tabular datasets derived from structural causal models (SCMs), demonstrates the critical role synthetic data plays in developing powerful tabular foundation models. However, most real-world tabular data exists in relational formats spanning multiple interconnected tables - a structure not adequately addressed by current generation methods. In this work, we extend the SCM-based approach by developing a novel framework that generates realistic synthetic relational tabular data including causal relationships across tables. Our experiments confirm that this framework is able to construct relational datasets with complex inter-table dependencies mimicking real-world scenarios.

Page Count
6 pages

Category
Computer Science:
Machine Learning (CS)