CURE: Confidence-driven Unified Reasoning Ensemble Framework for Medical Question Answering
By: Ziad Elshaer, Essam A. Rashed
Potential Business Impact:
Helps doctors answer questions without expensive computers.
High-performing medical Large Language Models (LLMs) typically require extensive fine-tuning with substantial computational resources, limiting accessibility for resource-constrained healthcare institutions. This study introduces a confidence-driven multi-model framework that leverages model diversity to enhance medical question answering without fine-tuning. Our framework employs a two-stage architecture: a confidence detection module assesses the primary model's certainty, and an adaptive routing mechanism directs low-confidence queries to Helper models with complementary knowledge for collaborative reasoning. We evaluate our approach using Qwen3-30B-A3B-Instruct, Phi-4 14B, and Gemma 2 12B across three medical benchmarks; MedQA, MedMCQA, and PubMedQA. Result demonstrate that our framework achieves competitive performance, with particularly strong results in PubMedQA (95.0\%) and MedMCQA (78.0\%). Ablation studies confirm that confidence-aware routing combined with multi-model collaboration substantially outperforms single-model approaches and uniform reasoning strategies. This work establishes that strategic model collaboration offers a practical, computationally efficient pathway to improve medical AI systems, with significant implications for democratizing access to advanced medical AI in resource-limited settings.
Similar Papers
Learning to Trust the Crowd: A Multi-Model Consensus Reasoning Engine for Large Language Models
Artificial Intelligence
Makes AI answers more truthful and correct.
Collaboration among Multiple Large Language Models for Medical Question Answering
Computation and Language
Multiple AI doctors solve harder medical questions.
Structured Outputs Enable General-Purpose LLMs to be Medical Experts
Computation and Language
Helps AI give safer, smarter answers about health.