Score: 1

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Published: May 22, 2025 | arXiv ID: 2505.16854v2

By: Jiaqi Wang , Kevin Qinghong Lin , James Cheng and more

Potential Business Impact:

Teaches AI to think only when needed.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Reinforcement Learning (RL) has proven to be an effective post-training strategy for enhancing reasoning in vision-language models (VLMs). Group Relative Policy Optimization (GRPO) is a recent prominent method that encourages models to generate complete reasoning traces before answering, leading to increased token usage and computational cost. Inspired by the human-like thinking process-where people skip reasoning for easy questions but think carefully when needed-we explore how to enable VLMs to first decide when reasoning is necessary. To realize this, we propose TON, a two-stage training strategy: (i) a supervised fine-tuning (SFT) stage with a simple yet effective 'thought dropout' operation, where reasoning traces are randomly replaced with empty thoughts. This introduces a think-or-not format that serves as a cold start for selective reasoning; (ii) a GRPO stage that enables the model to freely explore when to think or not, while maximizing task-aware outcome rewards. Experimental results show that TON can reduce the completion length by up to 90% compared to vanilla GRPO, without sacrificing performance or even improving it. Further evaluations across diverse vision-language tasks-covering a range of reasoning difficulties under both 3B and 7B models-consistently reveal that the model progressively learns to bypass unnecessary reasoning steps as training advances. These findings shed light on the path toward human-like reasoning patterns in reinforcement learning approaches. Our code is available at https://github.com/kokolerk/TON.

Reinforcing Video Reasoning with Focused Thinking

CV and Pattern Recognition

Helps computers understand videos by focusing on important parts.

30 May 2025 2

91%

Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning

Artificial Intelligence

Makes AI think smarter and avoid mistakes.

2 Oct 2025 0

91%

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Artificial Intelligence

Teaches AI to think through problems, not just copy.

17 Mar 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com github.com

Page Count

32 pages

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Teaches AI to think only when needed.

Technical Abstract

Reinforcing Video Reasoning with Focused Thinking

Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization