From Promising Capability to Pervasive Bias: Assessing Large Language Models for Emergency Department Triage
By: Joseph Lee , Tianqi Shang , Jae Young Baik and more
Potential Business Impact:
Helps doctors decide who needs care fastest.
Large Language Models (LLMs) have shown promise in clinical decision support, yet their application to triage remains underexplored. We systematically investigate the capabilities of LLMs in emergency department triage through two key dimensions: (1) robustness to distribution shifts and missing data, and (2) counterfactual analysis of intersectional biases across sex and race. We assess multiple LLM-based approaches, ranging from continued pre-training to in-context learning, as well as machine learning approaches. Our results indicate that LLMs exhibit superior robustness, and we investigate the key factors contributing to the promising LLM-based approaches. Furthermore, in this setting, we identify gaps in LLM preferences that emerge in particular intersections of sex and race. LLMs generally exhibit sex-based differences, but they are most pronounced in certain racial groups. These findings suggest that LLMs encode demographic preferences that may emerge in specific clinical contexts or particular combinations of characteristics.
Similar Papers
A Counterfactual LLM Framework for Detecting Human Biases: A Case Study of Sex/Gender in Emergency Triage
Computers and Society
Finds hidden gender bias in medical decisions.
Bias in Large Language Models Across Clinical Applications: A Systematic Review
Computation and Language
Fixes AI mistakes in doctor's notes for fairness.
Large Language Models in Healthcare
Computers and Society
Helps doctors use smart computers for better patient care.