Score: 0

Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs

Published: July 25, 2025 | arXiv ID: 2507.19334v1

By: Shuo Yang , Zheyu Zhang , Bardh Prenkaj and more

Potential Business Impact:

Makes fake data for computers faster, safer.

Business Areas:

Big Data Data and Analytics

Tabular data is critical across diverse domains, yet high-quality datasets remain scarce due to privacy concerns and the cost of collection. Contemporary approaches adopt large language models (LLMs) for tabular augmentation, but exhibit two major limitations: (1) dense dependency modeling among tabular features that can introduce bias, and (2) high computational overhead in sampling. To address these issues, we propose SPADA for SPArse Dependency-driven Augmentation, a lightweight generative framework that explicitly captures sparse dependencies via an LLM-induced graph. We treat each feature as a node and synthesize values by traversing the graph, conditioning each feature solely on its parent nodes. We explore two synthesis strategies: a non-parametric method using Gaussian kernel density estimation, and a conditional normalizing flow model that learns invertible mappings for conditional density estimation. Experiments on four datasets show that SPADA reduces constraint violations by 4% compared to diffusion-based methods and accelerates generation by nearly 9,500 times over LLM-based baselines.

Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models

Computation and Language

Helps computers understand complex data tables better.

24 Jul 2025 1

89%

Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models

Computation and Language

Helps computers understand data tables better.

24 Jul 2025 1

88%

A Note on Statistically Accurate Tabular Data Generation Using Large Language Models

Machine Learning (CS)

Makes fake computer data more like real data.

5 May 2025 1

View PDF Login to Bookmark

Page Count

21 pages

Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs

Makes fake data for computers faster, safer.

Technical Abstract

Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models

Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models

A Note on Statistically Accurate Tabular Data Generation Using Large Language Models