Score: 0

Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization

Published: January 7, 2026 | arXiv ID: 2601.04442v1

By: Xingjian Diao , Zheyuan Liu , Chunhui Zhang and more

Potential Business Impact:

Helps AI think faster and smarter.

Business Areas:

Image Recognition Data and Analytics, Software

Large Vision-Language Models (LVLMs) have exhibited strong reasoning capabilities through chain-of-thought mechanisms that generate step-by-step rationales. However, such slow-thinking approaches often lead to overthinking, where models produce excessively verbose responses even for simple queries, resulting in test-time inefficiency and even degraded accuracy. Prior work has attempted to mitigate this issue via adaptive reasoning strategies, but these methods largely overlook a fundamental bottleneck: visual perception failures. We argue that stable reasoning critically depends on low-level visual grounding, and that reasoning errors often originate from imperfect perception rather than insufficient deliberation. To address this limitation, we propose Gated Perception-Reasoning Optimization (GPRO), a meta-reasoning controller that dynamically routes computation among three decision paths at each generation step: a lightweight fast path, a slow perception path for re-examining visual inputs, and a slow reasoning path for internal self-reflection. To learn this distinction, we derive large-scale failure attribution supervision from approximately 790k samples, using teacher models to distinguish perceptual hallucinations from reasoning errors. We then train the controller with multi-objective reinforcement learning to optimize the trade-off between task accuracy and computational cost under uncertainty. Experiments on five benchmarks demonstrate that GPRO substantially improves both accuracy and efficiency, outperforming recent slow-thinking methods while generating significantly shorter responses.

Guiding the Inner Eye: A Framework for Hierarchical and Flexible Visual Grounded Reasoning

CV and Pattern Recognition

Helps AI "see" and "think" about pictures better.

27 Nov 2025 2

90%

Learning to Think Fast and Slow for Visual Language Models

CV and Pattern Recognition

Helps AI think fast or slow like people.

20 Nov 2025 1

90%

From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning

CV and Pattern Recognition

Helps AI see and think better to solve puzzles.

1 Jan 2026 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

10 pages

Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization

Helps AI think faster and smarter.

Technical Abstract

Guiding the Inner Eye: A Framework for Hierarchical and Flexible Visual Grounded Reasoning

Learning to Think Fast and Slow for Visual Language Models

From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning