Beyond MedQA: Towards Real-world Clinical Decision Making in the Era of LLMs
By: Yunpeng Xiao , Carl Yang , Mark Mai and more
Potential Business Impact:
Helps doctors make better choices using smart computer programs.
Large language models (LLMs) show promise for clinical use. They are often evaluated using datasets such as MedQA. However, Many medical datasets, such as MedQA, rely on simplified Question-Answering (Q\A) that underrepresents real-world clinical decision-making. Based on this, we propose a unifying paradigm that characterizes clinical decision-making tasks along two dimensions: Clinical Backgrounds and Clinical Questions. As the background and questions approach the real clinical environment, the difficulty increases. We summarize the settings of existing datasets and benchmarks along two dimensions. Then we review methods to address clinical decision-making, including training-time and test-time techniques, and summarize when they help. Next, we extend evaluation beyond accuracy to include efficiency, explainability. Finally, we highlight open challenges. Our paradigm clarifies assumptions, standardizes comparisons, and guides the development of clinically meaningful LLMs.
Similar Papers
Structured Outputs Enable General-Purpose LLMs to be Medical Experts
Computation and Language
Helps AI give safer, smarter answers about health.
Reasoning LLMs in the Medical Domain: A Literature Survey
Artificial Intelligence
Helps doctors make better health choices.
Harnessing Collective Intelligence of LLMs for Robust Biomedical QA: A Multi-Model Approach
Computation and Language
Helps doctors find answers in medical books faster.