Explaining News Bias Detection: A Comparative SHAP Analysis of Transformer Model Decision Mechanisms
By: Himel Ghosh
Automated bias detection in news text is heavily used to support journalistic analysis and media accountability, yet little is known about how bias detection models arrive at their decisions or why they fail. In this work, we present a comparative interpretability study of two transformer-based bias detection models: a bias detector fine-tuned on the BABE dataset and a domain-adapted pre-trained RoBERTa model fine-tuned on the BABE dataset, using SHAP-based explanations. We analyze word-level attributions across correct and incorrect predictions to characterize how different model architectures operationalize linguistic bias. Our results show that although both models attend to similar categories of evaluative language, they differ substantially in how these signals are integrated into predictions. The bias detector model assigns stronger internal evidence to false positives than to true positives, indicating a misalignment between attribution strength and prediction correctness and contributing to systematic over-flagging of neutral journalistic content. In contrast, the domain-adaptive model exhibits attribution patterns that better align with prediction outcomes and produces 63\% fewer false positives. We further demonstrate that model errors arise from distinct linguistic mechanisms, with false positives driven by discourse-level ambiguity rather than explicit bias cues. These findings highlight the importance of interpretability-aware evaluation for bias detection systems and suggest that architectural and training choices critically affect both model reliability and deployment suitability in journalistic contexts.
Similar Papers
Bridging Human and Model Perspectives: A Comparative Analysis of Political Bias Detection in News Media Using Large Language Models
Computation and Language
Helps computers spot fake news bias like people.
To Bias or Not to Bias: Detecting bias in News with bias-detector
Computation and Language
Helps computers spot unfair news stories.
Misinformation Detection using Large Language Models with Explainability
Computation and Language
Finds fake news online and shows why.