FairTabGen: Unifying Counterfactual and Causal Fairness in Synthetic Tabular Data Generation
By: Nitish Nagesh , Salar Shakibhamedan , Mahdi Bagheri and more
Potential Business Impact:
Creates fair fake data for computers.
Generating synthetic data is crucial in privacy-sensitive, data-scarce settings, especially for tabular datasets widely used in real-world applications. A key challenge is improving counterfactual and causal fairness, while preserving high utility. We present FairTabGen, a fairness-aware large language model-based framework for tabular synthetic data generation. We integrate multiple fairness definitions including counterfactual and causal fairness into both its generation and evaluation pipelines. We use in-context learning, prompt refinement, and fairness-aware data curation to balance fairness and utility. Across diverse datasets, our method outperforms state-of-the-art GAN-based and LLM-based methods, achieving up to 10% improvements on fairness metrics such as demographic parity and path-specific causal effects while retaining statistical utility. Remarkably, it achieves these gains using less than 20% of the original data, highlighting its efficiency in low-data regimes. These results demonstrate a principled and practical approach for generating fair and useful synthetic tabular data.
Similar Papers
Privacy-Preserving Fair Synthetic Tabular Data
Machine Learning (CS)
Creates private, fair data for sharing without bias.
Counterfactual Fairness Evaluation of Machine Learning Models on Educational Datasets
Computers and Society
Makes school AI treat all students fairly.
GenFacts-Generative Counterfactual Explanations for Multi-Variate Time Series
Machine Learning (CS)
Shows how to change data to get different results.