Score: 0

ACG: Action Coherence Guidance for Flow-based VLA models

Published: October 25, 2025 | arXiv ID: 2510.22201v1

By: Minho Park , Kinam Kim , Junha Hyung and more

Potential Business Impact:

Makes robots move smoother and more accurately.

Business Areas:

Autonomous Vehicles Transportation

Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instructions. Yet, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catastrophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks. Code and project page are available at https://github.com/DAVIAN-Robotics/ACG and https://DAVIAN-Robotics.github.io/ACG , respectively.

VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

CV and Pattern Recognition

Teaches robots to think and do tasks.

2 Oct 2025 0

90%

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification

CV and Pattern Recognition

Teaches robots to do tasks faster and cheaper.

28 Aug 2025 1

90%

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Robotics

Makes robots learn and do tasks better.

2 Dec 2025 1

View PDF Login to Bookmark

Page Count

8 pages

ACG: Action Coherence Guidance for Flow-based VLA models

Makes robots move smoother and more accurately.

Technical Abstract

VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach