HALF: Harm-Aware LLM Fairness Evaluation Aligned with Deployment
By: Ali Mekky , Omar El Herraoui , Preslav Nakov and more
Potential Business Impact:
Tests AI for unfairness, considering how bad the mistake is.
Large language models (LLMs) are increasingly deployed across high-impact domains, from clinical decision support and legal analysis to hiring and education, making fairness and bias evaluation before deployment critical. However, existing evaluations lack grounding in real-world scenarios and do not account for differences in harm severity, e.g., a biased decision in surgery should not be weighed the same as a stylistic bias in text summarization. To address this gap, we introduce HALF (Harm-Aware LLM Fairness), a deployment-aligned framework that assesses model bias in realistic applications and weighs the outcomes by harm severity. HALF organizes nine application domains into three tiers (Severe, Moderate, Mild) using a five-stage pipeline. Our evaluation results across eight LLMs show that (1) LLMs are not consistently fair across domains, (2) model size or performance do not guarantee fairness, and (3) reasoning models perform better in medical decision support but worse in education. We conclude that HALF exposes a clear gap between previous benchmarking success and deployment readiness.
Similar Papers
mFARM: Towards Multi-Faceted Fairness Assessment based on HARMs in Clinical Decision Support
Artificial Intelligence
Helps doctors give fair and accurate patient care.
Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution
Computation and Language
Finds AI unfairly favors some people over others.
Improving Fairness in LLMs Through Testing-Time Adversaries
Computation and Language
Makes AI fairer by spotting and fixing bias.