When Are Learning Biases Equivalent? A Unifying Framework for Fairness, Robustness, and Distribution Shift
By: Sushant Mehta
Potential Business Impact:
Fixes computer mistakes for fairness and accuracy.
Machine learning systems exhibit diverse failure modes: unfairness toward protected groups, brittleness to spurious correlations, poor performance on minority sub-populations, which are typically studied in isolation by distinct research communities. We propose a unifying theoretical framework that characterizes when different bias mechanisms produce quantitatively equivalent effects on model performance. By formalizing biases as violations of conditional independence through information-theoretic measures, we prove formal equivalence conditions relating spurious correlations, subpopulation shift, class imbalance, and fairness violations. Our theory predicts that a spurious correlation of strength $α$ produces equivalent worst-group accuracy degradation as a sub-population imbalance ratio $r \approx (1+α)/(1-α)$ under feature overlap assumptions. Empirical validation in six datasets and three architectures confirms that predicted equivalences hold within the accuracy of the worst group 3\%, enabling the principled transfer of debiasing methods across problem domains. This work bridges the literature on fairness, robustness, and distribution shifts under a common perspective.
Similar Papers
Software Fairness Dilemma: Is Bias Mitigation a Zero-Sum Game?
Machine Learning (CS)
Makes AI fairer without hurting anyone's performance.
Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness
Machine Learning (Stat)
Checks if AI treats everyone fairly.
Learning with Statistical Equality Constraints
Machine Learning (CS)
Teaches computers to learn with strict rules.