Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
By: Dachuan Zhao , Weiyue Li , Zhenda Shen and more
Potential Business Impact:
Fixes AI that unfairly judges people's pictures.
Vision-Language Models (VLMs) have become indispensable for multimodal reasoning, yet their representations often encode and amplify demographic biases, resulting in biased associations and misaligned predictions in downstream tasks. Such behavior undermines fairness and distorts the intended alignment between vision and language. Recent post-hoc approaches attempt to mitigate bias by replacing the most attribute-correlated embedding coordinates with neutral values. However, our systematic analysis reveals three critical failures of this coordinate-wise approach: feature entanglement, poor cross-dataset generalization, and incomplete bias removal. We find that bias is not localized to a few coordinates but is instead distributed across a few linear subspaces. To address these limitations, we propose $\textbf{S}$ubspace $\textbf{P}$rojection $\textbf{D}$ebiasing ($\textbf{SPD}$), a geometrically principled framework that identifies and removes the entire subspace of linearly decodable bias while reinserting a neutral mean component to preserve semantic fidelity. Extensive experiments across zero-shot classification, text-to-image retrieval, and image generation validate the effectiveness of SPD: our method achieves more robust debiasing with an average improvement of $18.5\%$ across four fairness metrics, while maintaining minimal loss in task performance compared to the best debiasing baseline.
Similar Papers
Mitigating Coordinate Prediction Bias from Positional Encoding Failures
CV and Pattern Recognition
Helps computers find exact spots in pictures.
Investigating Spatial Attention Bias in Vision-Language Models
CV and Pattern Recognition
Computers see pictures left-to-right, not randomly.
BioPro: On Difference-Aware Gender Fairness for Vision-Language Models
Artificial Intelligence
Fixes AI's unfair gender pictures and words.