Score: 0

When Fairness Isn't Statistical: The Limits of Machine Learning in Evaluating Legal Reasoning

Published: June 4, 2025 | arXiv ID: 2506.03913v1

By: Claire Barale, Michael Rovatsos, Nehal Bhuta

Potential Business Impact:

Shows computer fairness checks fail in law.

Business Areas:
Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Legal decisions are increasingly evaluated for fairness, consistency, and bias using machine learning (ML) techniques. In high-stakes domains like refugee adjudication, such methods are often applied to detect disparities in outcomes. Yet it remains unclear whether statistical methods can meaningfully assess fairness in legal contexts shaped by discretion, normative complexity, and limited ground truth. In this paper, we empirically evaluate three common ML approaches (feature-based analysis, semantic clustering, and predictive modeling) on a large, real-world dataset of 59,000+ Canadian refugee decisions (AsyLex). Our experiments show that these methods produce divergent and sometimes contradictory signals, that predictive modeling often depends on contextual and procedural features rather than legal features, and that semantic clustering fails to capture substantive legal reasoning. We show limitations of statistical fairness evaluation, challenge the assumption that statistical regularity equates to fairness, and argue that current computational approaches fall short of evaluating fairness in legally discretionary domains. We argue that evaluating fairness in law requires methods grounded not only in data, but in legal reasoning and institutional context.

Country of Origin
🇬🇧 United Kingdom

Page Count
11 pages

Category
Computer Science:
Computation and Language