MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering
By: Hai-Dang Nguyen , Minh-Anh Dang , Minh-Tan Le and more
Potential Business Impact:
Shows doctors why AI suggests a diagnosis.
Explainability is critical for the clinical adoption of medical visual question answering (VQA) systems, as physicians require transparent reasoning to trust AI-generated diagnoses. We present MedXplain-VQA, a comprehensive framework integrating five explainable AI components to deliver interpretable medical image analysis. The framework leverages a fine-tuned BLIP-2 backbone, medical query reformulation, enhanced Grad-CAM attention, precise region extraction, and structured chain-of-thought reasoning via multi-modal language models. To evaluate the system, we introduce a medical-domain-specific framework replacing traditional NLP metrics with clinically relevant assessments, including terminology coverage, clinical structure quality, and attention region relevance. Experiments on 500 PathVQA histopathology samples demonstrate substantial improvements, with the enhanced system achieving a composite score of 0.683 compared to 0.378 for baseline methods, while maintaining high reasoning confidence (0.890). Our system identifies 3-5 diagnostically relevant regions per sample and generates structured explanations averaging 57 words with appropriate clinical terminology. Ablation studies reveal that query reformulation provides the most significant initial improvement, while chain-of-thought reasoning enables systematic diagnostic processes. These findings underscore the potential of MedXplain-VQA as a robust, explainable medical VQA system. Future work will focus on validation with medical experts and large-scale clinical datasets to ensure clinical readiness.
Similar Papers
Medico 2025: Visual Question Answering for Gastrointestinal Imaging
CV and Pattern Recognition
Helps doctors understand stomach pictures better.
MedXAI: A Retrieval-Augmented and Self-Verifying Framework for Knowledge-Guided Medical Image Analysis
Machine Learning (CS)
Helps doctors find rare diseases in scans.
Multi-Task Learning for Visually Grounded Reasoning in Gastrointestinal VQA
CV and Pattern Recognition
Helps doctors understand medical images better.