Score: 2

Incentivizing Tool-augmented Thinking with Images for Medical Image Analysis

Published: December 16, 2025 | arXiv ID: 2512.14157v1

By: Yankai Jiang , Yujie Zhang , Peng Zhang and more

Potential Business Impact:

Helps doctors diagnose illnesses by "thinking with images."

Business Areas:

Image Recognition Data and Analytics, Software

Recent reasoning based medical MLLMs have made progress in generating step by step textual reasoning chains. However, they still struggle with complex tasks that necessitate dynamic and iterative focusing on fine-grained visual regions to achieve precise grounding and diagnosis. We introduce Ophiuchus, a versatile, tool-augmented framework that equips an MLLM to (i) decide when additional visual evidence is needed, (ii) determine where to probe and ground within the medical image, and (iii) seamlessly weave the relevant sub-image content back into an interleaved, multimodal chain of thought. In contrast to prior approaches limited by the performance ceiling of specialized tools, Ophiuchus integrates the model's inherent grounding and perception capabilities with external tools, thereby fostering higher-level reasoning. The core of our method is a three-stage training strategy: cold-start training with tool-integrated reasoning data to achieve basic tool selection and adaptation for inspecting key regions; self-reflection fine-tuning to strengthen reflective reasoning and encourage revisiting tool outputs; and Agentic Tool Reinforcement Learning to directly optimize task-specific rewards and emulate expert-like diagnostic behavior. Extensive experiments show that Ophiuchus consistently outperforms both closed-source and open-source SOTA methods across diverse medical benchmarks, including VQA, detection, and reasoning-based segmentation. Our approach illuminates a path toward medical AI agents that can genuinely "think with images" through tool-integrated reasoning. Datasets, codes, and trained models will be released publicly.

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

CV and Pattern Recognition

AI learns to use tools to solve visual problems.

13 May 2025 1

87%

Guiding the Inner Eye: A Framework for Hierarchical and Flexible Visual Grounded Reasoning

CV and Pattern Recognition

Helps AI "see" and "think" about pictures better.

27 Nov 2025 2

87%

VICoT-Agent: A Vision-Interleaved Chain-of-Thought Framework for Interpretable Multimodal Reasoning and Scalable Remote Sensing Analysis

Artificial Intelligence

Helps computers understand pictures by thinking step-by-step.

25 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com github.com

Page Count

39 pages

Incentivizing Tool-augmented Thinking with Images for Medical Image Analysis

Helps doctors diagnose illnesses by "thinking with images."

Technical Abstract

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

Guiding the Inner Eye: A Framework for Hierarchical and Flexible Visual Grounded Reasoning

VICoT-Agent: A Vision-Interleaved Chain-of-Thought Framework for Interpretable Multimodal Reasoning and Scalable Remote Sensing Analysis