Score: 2

Reinforcing Video Reasoning with Focused Thinking

Published: May 30, 2025 | arXiv ID: 2505.24718v3

By: Jisheng Dang , Jingze Wu , Teng Wang and more

Potential Business Impact:

Helps computers understand videos by focusing on important parts.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Recent advancements in reinforcement learning, particularly through Group Relative Policy Optimization (GRPO), have significantly improved multimodal large language models for complex reasoning tasks. However, two critical limitations persist: 1) they often produce unfocused, verbose reasoning chains that obscure salient spatiotemporal cues and 2) binary rewarding fails to account for partially correct answers, resulting in high reward variance and inefficient learning. In this paper, we propose TW-GRPO, a novel framework that enhances visual reasoning with focused thinking and dense reward granularity. Specifically, we employs a token weighting mechanism that prioritizes tokens with high informational density (estimated by intra-group information entropy), suppressing redundant tokens like generic reasoning prefixes. Furthermore, we reformulate RL training by shifting from single-choice to multi-choice QA tasks, where soft rewards enable finer-grained gradient estimation by distinguishing partial correctness. Additionally, we propose question-answer inversion, a data augmentation strategy to generate diverse multi-choice samples from existing benchmarks. Experiments demonstrate state-of-the-art performance on several video reasoning and general understanding benchmarks. Notably, TW-GRPO achieves 50.4\% accuracy on CLEVRER (18.8\% improvement over Video-R1) and 65.8\% on MMVU. Our codes are available at \href{https://github.com/longmalongma/TW-GRPO}.

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Artificial Intelligence

Teaches AI to think only when needed.

22 May 2025 1

91%

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

CV and Pattern Recognition

Helps AI understand videos better by learning smarter.

9 Jun 2025 1

91%

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Artificial Intelligence

Teaches AI to think through problems, not just copy.

17 Mar 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

23 pages

Reinforcing Video Reasoning with Focused Thinking

Helps computers understand videos by focusing on important parts.

Technical Abstract

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization