Evaluating the Challenges of LLMs in Real-world Medical Follow-up: A Comparative Study and An Optimized Framework
By: Jinyan Liu , Zikang Chen , Qinchuan Wang and more
Potential Business Impact:
Makes chatbots better at asking patients questions.
When applied directly in an end-to-end manner to medical follow-up tasks, Large Language Models (LLMs) often suffer from uncontrolled dialog flow and inaccurate information extraction due to the complexity of follow-up forms. To address this limitation, we designed and compared two follow-up chatbot systems: an end-to-end LLM-based system (control group) and a modular pipeline with structured process control (experimental group). Experimental results show that while the end-to-end approach frequently fails on lengthy and complex forms, our modular method-built on task decomposition, semantic clustering, and flow management-substantially improves dialog stability and extraction accuracy. Moreover, it reduces the number of dialogue turns by 46.73% and lowers token consumption by 80% to 87.5%. These findings highlight the necessity of integrating external control mechanisms when deploying LLMs in high-stakes medical follow-up scenarios.
Similar Papers
Enabling Doctor-Centric Medical AI with LLMs through Workflow-Aligned Tasks and Benchmarks
Computation and Language
Helps doctors use AI for better patient care.
Identifying Imaging Follow-Up in Radiology Reports: A Comparative Analysis of Traditional ML and LLM Approaches
Computation and Language
Helps doctors know if patients need more scans.
Large Language Models in Healthcare
Computers and Society
Helps doctors use smart computers for better patient care.