Learning to Think Fast and Slow for Visual Language Models
By: Chenyu Lin , Cheng Chi , Jinlin Wu and more
Potential Business Impact:
Helps AI think fast or slow like people.
When confronted with complex problems, we tend to think slowly; conversely, for simple questions, we think quickly. Such a two-system thinking mechanism allows us to efficiently allocate cognitive resources, enabling quick decision-making for straightforward issues while reserving deeper analytical thinking for more intricate challenges. However, existing reasoning-oriented visual language models (VLMs), whether trained with explicit chain-of-thought annotations or rule-based RL rewards, mainly pursue lengthy, detailed reasoning chains, which often lead to excessive computational costs. In this work, we propose a simple RL approach, which enables VLMs to automatically switch between fast and slow thinking modes depending on task difficulty. The approach consists of two stages: in the first stage, we label data as either requiring fast thinking or slow thinking based on the model output length, which is inspired by the observation that pre-trained VLMs typically produce answers of varying lengths for different types of questions; in the second stage, we train the model using GRPO along with the thinking mode labels to develop dual-mode thinking. Despite its simplicity, our model, named DualMindVLM, significantly outperforms the base model and achieves performance on par with state-of-the-art visual reasoning models, while maintaining exceptionally high token efficiency.
Similar Papers
Fast-Slow Thinking for Large Vision-Language Model Reasoning
Computation and Language
Makes AI think faster and smarter, using less words.
Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models
CV and Pattern Recognition
Helps computers "see" and think better.
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
Computation and Language
Helps computers think through pictures and words.