A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering
By: Ziruo Yi , Jinyu Liu , Ting Xiao and more
Potential Business Impact:
Helps doctors understand X-rays better and faster.
Radiology visual question answering (RVQA) provides precise answers to questions about chest X-ray images, alleviating radiologists' workload. While recent methods based on multimodal large language models (MLLMs) and retrieval-augmented generation (RAG) have shown promising progress in RVQA, they still face challenges in factual accuracy, hallucinations, and cross-modal misalignment. We introduce a multi-agent system (MAS) designed to support complex reasoning in RVQA, with specialized agents for context understanding, multimodal reasoning, and answer validation. We evaluate our system on a challenging RVQA set curated via model disagreement filtering, comprising consistently hard cases across multiple MLLMs. Extensive experiments demonstrate the superiority and effectiveness of our system over strong MLLM baselines, with a case study illustrating its reliability and interpretability. This work highlights the potential of multi-agent approaches to support explainable and trustworthy clinical AI applications that require complex reasoning.
Similar Papers
Agentic large language models improve retrieval-based radiology question answering
Computation and Language
Boosts AI accuracy in radiology diagnoses
Agentic large language models improve retrieval-based radiology question answering
Computation and Language
Boosts AI accuracy on radiology questions
RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows
Multiagent Systems
Helps doctors read X-rays better and more safely.