Beyond Synthetic Augmentation: Group-Aware Threshold Calibration for Robust Balanced Accuracy in Imbalanced Learning
By: Hunter Gittlin
Potential Business Impact:
Makes AI fairer for different groups of people.
Class imbalance remains a fundamental challenge in machine learning, with traditional solutions often creating as many problems as they solve. We demonstrate that group-aware threshold calibration--setting different decision thresholds for different demographic groups--provides superior robustness compared to synthetic data generation methods. Through extensive experiments, we show that group-specific thresholds achieve 1.5-4% higher balanced accuracy than SMOTE and CT-GAN augmented models while improving worst-group balanced accuracy. Unlike single-threshold approaches that apply one cutoff across all groups, our group-aware method optimizes the Pareto frontier between balanced accuracy and worst-group balanced accuracy, enabling fine-grained control over group-level performance. Critically, we find that applying group thresholds to synthetically augmented data yields minimal additional benefit, suggesting these approaches are fundamentally redundant. Our results span seven model families including linear, tree-based, instance-based, and boosting methods, confirming that group-aware threshold calibration offers a simpler, more interpretable, and more effective solution to class imbalance.
Similar Papers
Bias-Corrected Data Synthesis for Imbalanced Learning
Machine Learning (Stat)
Fixes computer guessing when most examples are wrong.
Multiaccuracy and Multicalibration via Proxy Groups
Machine Learning (Stat)
Makes computer decisions fair even with missing data.
Class-Conditional Distribution Balancing for Group Robust Classification
Machine Learning (CS)
Fixes computer guesses that are wrong for bad reasons.