Score: 1

FlowVLA: Thinking in Motion with a Visual Chain of Thought

Published: August 25, 2025 | arXiv ID: 2508.18269v1

By: Zhide Zhong , Haodong Yan , Junfeng Li and more

Potential Business Impact:

Helps robots learn to move and act better.

Business Areas:

Autonomous Vehicles Transportation

Many Vision-Language-Action (VLA) models rely on an internal world model trained via next-frame prediction. This approach, however, struggles with physical reasoning as it entangles static appearance with dynamic motion, often resulting in implausible visual forecasts and inefficient policy learning. To address these limitations, we introduce the Visual Chain of Thought (Visual CoT): a pre-training framework that encourages a model to reason about how a scene evolves before predicting what it will look like. We instantiate this principle in FlowVLA, which predicts a future frame ($v_{t+1}$) only after generating an intermediate optical flow representation ($f_t$) that encodes motion dynamics. This ``$v_t \rightarrow f_t \rightarrow v_{t+1}$'' reasoning process is implemented within a single autoregressive Transformer, guiding the model to learn disentangled dynamics. As a result, FlowVLA produces coherent visual predictions and facilitates more efficient policy learning. Experiments on challenging robotics manipulation benchmarks demonstrate state-of-the-art performance with substantially improved sample efficiency, pointing toward a more principled foundation for world modeling. Project page: https://irpn-lab.github.io/FlowVLA/

FlowVLA: Thinking in Motion with a Visual Chain of Thought

Robotics

Helps robots learn to move objects faster.

25 Aug 2025 1

92%

ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation

Robotics

Robots learn to build things by watching goals.

1 Dec 2025 1

91%

GraphCoT-VLA: A 3D Spatial-Aware Reasoning Vision-Language-Action Model for Robotic Manipulation with Ambiguous Instructions

Robotics

Robots understand confusing orders and see in 3D.

11 Aug 2025 0

View PDF Login to Bookmark

Page Count

15 pages

FlowVLA: Thinking in Motion with a Visual Chain of Thought

Helps robots learn to move and act better.

Technical Abstract

FlowVLA: Thinking in Motion with a Visual Chain of Thought

ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation

GraphCoT-VLA: A 3D Spatial-Aware Reasoning Vision-Language-Action Model for Robotic Manipulation with Ambiguous Instructions