Explanations as Bias Detectors: A Critical Study of Local Post-hoc XAI Methods for Fairness Exploration
By: Vasiliki Papanikou , Danae Pla Karidi , Evaggelia Pitoura and more
Potential Business Impact:
Finds unfairness in AI, making it more just.
As Artificial Intelligence (AI) is increasingly used in areas that significantly impact human lives, concerns about fairness and transparency have grown, especially regarding their impact on protected groups. Recently, the intersection of explainability and fairness has emerged as an important area to promote responsible AI systems. This paper explores how explainability methods can be leveraged to detect and interpret unfairness. We propose a pipeline that integrates local post-hoc explanation methods to derive fairness-related insights. During the pipeline design, we identify and address critical questions arising from the use of explanations as bias detectors such as the relationship between distributive and procedural fairness, the effect of removing the protected attribute, the consistency and quality of results across different explanation methods, the impact of various aggregation strategies of local explanations on group fairness evaluations, and the overall trustworthiness of explanations as bias detectors. Our results show the potential of explanation methods used for fairness while highlighting the need to carefully consider the aforementioned critical aspects.
Similar Papers
Argumentative Debates for Transparent Bias Detection [Technical Report]
Artificial Intelligence
Finds unfairness in AI by explaining its reasoning.
eXIAA: eXplainable Injections for Adversarial Attack
Machine Learning (CS)
Tricks AI into showing wrong reasons for its choices.
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods
Computation and Language
Makes AI explain itself fairly to everyone.