Score: 0

From Promising Capability to Pervasive Bias: Assessing Large Language Models for Emergency Department Triage

Published: April 22, 2025 | arXiv ID: 2504.16273v2

By: Joseph Lee , Tianqi Shang , Jae Young Baik and more

Potential Business Impact:

Helps doctors decide who needs care fastest.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Models (LLMs) have shown promise in clinical decision support, yet their application to triage remains underexplored. We systematically investigate the capabilities of LLMs in emergency department triage through two key dimensions: (1) robustness to distribution shifts and missing data, and (2) counterfactual analysis of intersectional biases across sex and race. We assess multiple LLM-based approaches, ranging from continued pre-training to in-context learning, as well as machine learning approaches. Our results indicate that LLMs exhibit superior robustness, and we investigate the key factors contributing to the promising LLM-based approaches. Furthermore, in this setting, we identify gaps in LLM preferences that emerge in particular intersections of sex and race. LLMs generally exhibit sex-based differences, but they are most pronounced in certain racial groups. These findings suggest that LLMs encode demographic preferences that may emerge in specific clinical contexts or particular combinations of characteristics.

Country of Origin
🇺🇸 United States

Page Count
16 pages

Category
Computer Science:
Artificial Intelligence