Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
By: Yuxiang Lai , Jike Zhong , Ming Li and more
Potential Business Impact:
Helps doctors understand X-rays better and faster.
Vision-language models (VLMs) have achieved impressive progress in natural image reasoning, yet their potential in medical imaging remains underexplored. Medical vision-language tasks demand precise understanding and clinically coherent answers, which are difficult to achieve due to the complexity of medical data and the scarcity of high-quality expert annotations. These challenges limit the effectiveness of conventional supervised fine-tuning (SFT) and Chain-of-Thought (CoT) strategies that work well in general domains. To address these challenges, we propose Med-R1, a reinforcement learning (RL)-enhanced vision-language model designed to improve generalization and reliability in medical reasoning. Built on the DeepSeek strategy, Med-R1 adopts Group Relative Policy Optimization (GRPO) to encourage reward-guided learning beyond static annotations. We comprehensively evaluate Med-R1 across eight distinct medical imaging modalities. Med-R1 achieves a 29.94% improvement in average accuracy over its base model Qwen2-VL-2B, and even outperforms Qwen2-VL-72B-a model with 36x more parameters. To assess cross-task generalization, we further evaluate Med-R1 on five question types. Med-R1 outperforms Qwen2-VL-2B by 32.06% in question-type generalization, also surpassing Qwen2-VL-72B. We further explore the thinking process in Med-R1, a crucial component for the success of Deepseek-R1. Our results show that omitting intermediate rationales (No-Thinking-Med-R1) not only improves in-domain and cross-domain generalization with less training, but also challenges the assumption that more reasoning always helps. These findings suggest that in medical VQA, it is not reasoning itself, but its quality and domain alignment, that determine effectiveness. Together, these results highlight that RL improves medical reasoning and generalization, enabling efficient and reliable VLMs for real-world deployment.
Similar Papers
RARL: Improving Medical VLM Reasoning and Generalization with Reinforcement Learning and LoRA under Data and Hardware Constraints
CV and Pattern Recognition
Helps doctors understand medical pictures better.
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
CV and Pattern Recognition
Teaches computers to solve math problems better.
MedGR$^2$: Breaking the Data Barrier for Medical Reasoning via Generative Reward Learning
Machine Learning (CS)
Makes AI learn medicine from generated data.