Score: 0

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Published: March 18, 2025 | arXiv ID: 2503.13939v4

By: Yuxiang Lai , Jike Zhong , Ming Li and more

Potential Business Impact:

Helps doctors understand X-rays better and faster.

Business Areas:

Image Recognition Data and Analytics, Software

Vision-language models (VLMs) have achieved impressive progress in natural image reasoning, yet their potential in medical imaging remains underexplored. Medical vision-language tasks demand precise understanding and clinically coherent answers, which are difficult to achieve due to the complexity of medical data and the scarcity of high-quality expert annotations. These challenges limit the effectiveness of conventional supervised fine-tuning (SFT) and Chain-of-Thought (CoT) strategies that work well in general domains. To address these challenges, we propose Med-R1, a reinforcement learning (RL)-enhanced vision-language model designed to improve generalization and reliability in medical reasoning. Built on the DeepSeek strategy, Med-R1 adopts Group Relative Policy Optimization (GRPO) to encourage reward-guided learning beyond static annotations. We comprehensively evaluate Med-R1 across eight distinct medical imaging modalities. Med-R1 achieves a 29.94% improvement in average accuracy over its base model Qwen2-VL-2B, and even outperforms Qwen2-VL-72B-a model with 36x more parameters. To assess cross-task generalization, we further evaluate Med-R1 on five question types. Med-R1 outperforms Qwen2-VL-2B by 32.06% in question-type generalization, also surpassing Qwen2-VL-72B. We further explore the thinking process in Med-R1, a crucial component for the success of Deepseek-R1. Our results show that omitting intermediate rationales (No-Thinking-Med-R1) not only improves in-domain and cross-domain generalization with less training, but also challenges the assumption that more reasoning always helps. These findings suggest that in medical VQA, it is not reasoning itself, but its quality and domain alignment, that determine effectiveness. Together, these results highlight that RL improves medical reasoning and generalization, enabling efficient and reliable VLMs for real-world deployment.

RARL: Improving Medical VLM Reasoning and Generalization with Reinforcement Learning and LoRA under Data and Hardware Constraints

CV and Pattern Recognition

Helps doctors understand medical pictures better.

7 Jun 2025 0

92%

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

CV and Pattern Recognition

Teaches computers to solve math problems better.

9 Mar 2025 1

92%

MedGR$^2$: Breaking the Data Barrier for Medical Reasoning via Generative Reward Learning

Machine Learning (CS)

Makes AI learn medicine from generated data.

28 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

10 pages

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

Helps doctors understand X-rays better and faster.

Technical Abstract

RARL: Improving Medical VLM Reasoning and Generalization with Reinforcement Learning and LoRA under Data and Hardware Constraints

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

MedGR$^2$: Breaking the Data Barrier for Medical Reasoning via Generative Reward Learning