Score: 0

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Published: January 14, 2026 | arXiv ID: 2601.09708v1

By: Chi-Pin Huang , Yunze Man , Zhiding Yu and more

Vision-Language-Action (VLA) tasks require reasoning over complex visual scenes and executing adaptive actions in dynamic environments. While recent studies on reasoning VLAs show that explicit chain-of-thought (CoT) can improve generalization, they suffer from high inference latency due to lengthy reasoning traces. We propose Fast-ThinkAct, an efficient reasoning framework that achieves compact yet performant planning through verbalizable latent reasoning. Fast-ThinkAct learns to reason efficiently with latent CoTs by distilling from a teacher, driven by a preference-guided objective to align manipulation trajectories that transfers both linguistic and visual planning capabilities for embodied control. This enables reasoning-enhanced policy learning that effectively connects compact reasoning to action execution. Extensive experiments across diverse embodied manipulation and reasoning benchmarks demonstrate that Fast-ThinkAct achieves strong performance with up to 89.3\% reduced inference latency over state-of-the-art reasoning VLAs, while maintaining effective long-horizon planning, few-shot adaptation, and failure recovery.

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

CV and Pattern Recognition

Robots learn to plan and fix mistakes.

22 Jul 2025 0

94%

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

CV and Pattern Recognition

Robots learn to plan and fix mistakes.

22 Jul 2025 0

93%

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving

CV and Pattern Recognition

Helps self-driving cars drive smarter and faster.

25 Nov 2025 1

View PDF Login to Bookmark

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Technical Abstract

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving