Score: 1

Evaluating the Challenges of LLMs in Real-world Medical Follow-up: A Comparative Study and An Optimized Framework

Published: December 22, 2025 | arXiv ID: 2512.18999v1

By: Jinyan Liu , Zikang Chen , Qinchuan Wang and more

Potential Business Impact:

Makes chatbots better at asking patients questions.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

When applied directly in an end-to-end manner to medical follow-up tasks, Large Language Models (LLMs) often suffer from uncontrolled dialog flow and inaccurate information extraction due to the complexity of follow-up forms. To address this limitation, we designed and compared two follow-up chatbot systems: an end-to-end LLM-based system (control group) and a modular pipeline with structured process control (experimental group). Experimental results show that while the end-to-end approach frequently fails on lengthy and complex forms, our modular method-built on task decomposition, semantic clustering, and flow management-substantially improves dialog stability and extraction accuracy. Moreover, it reduces the number of dialogue turns by 46.73% and lowers token consumption by 80% to 87.5%. These findings highlight the necessity of integrating external control mechanisms when deploying LLMs in high-stakes medical follow-up scenarios.

Repos / Data Links

Page Count
10 pages

Category
Computer Science:
Computation and Language