TABFAIRGDT: A Fast Fair Tabular Data Generator using Autoregressive Decision Trees
By: Emmanouil Panagiotou , Benoît Ronval , Arjun Roy and more
Potential Business Impact:
Makes computer models fairer by fixing biased data.
Ensuring fairness in machine learning remains a significant challenge, as models often inherit biases from their training data. Generative models have recently emerged as a promising approach to mitigate bias at the data level while preserving utility. However, many rely on deep architectures, despite evidence that simpler models can be highly effective for tabular data. In this work, we introduce TABFAIRGDT, a novel method for generating fair synthetic tabular data using autoregressive decision trees. To enforce fairness, we propose a soft leaf resampling technique that adjusts decision tree outputs to reduce bias while preserving predictive performance. Our approach is non-parametric, effectively capturing complex relationships between mixed feature types, without relying on assumptions about the underlying data distributions. We evaluate TABFAIRGDT on benchmark fairness datasets and demonstrate that it outperforms state-of-the-art (SOTA) deep generative models, achieving better fairness-utility trade-off for downstream tasks, as well as higher synthetic data quality. Moreover, our method is lightweight, highly efficient, and CPU-compatible, requiring no data pre-processing. Remarkably, TABFAIRGDT achieves a 72% average speedup over the fastest SOTA baseline across various dataset sizes, and can generate fair synthetic data for medium-sized datasets (10 features, 10K samples) in just one second on a standard CPU, making it an ideal solution for real-world fairness-sensitive applications.
Similar Papers
FairTabGen: Unifying Counterfactual and Causal Fairness in Synthetic Tabular Data Generation
Machine Learning (CS)
Creates fair fake data for computers.
CART-based Synthetic Tabular Data Generation for Imbalanced Regression
Machine Learning (CS)
Helps computers learn from rare data better.
Learning Decision Trees as Amortized Structure Inference
Machine Learning (CS)
Builds smarter computer programs that learn from data.