Score: 3

SQAP-VLA: A Synergistic Quantization-Aware Pruning Framework for High-Performance Vision-Language-Action Models

Published: September 11, 2025 | arXiv ID: 2509.09090v1

By: Hengyu Fang , Yijiang Liu , Yuan Du and more

Potential Business Impact:

Makes smart robots faster and use less power.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Vision-Language-Action (VLA) models exhibit unprecedented capabilities for embodied intelligence. However, their extensive computational and memory costs hinder their practical deployment. Existing VLA compression and acceleration approaches conduct quantization or token pruning in an ad-hoc manner but fail to enable both for a holistic efficiency improvement due to an observed incompatibility. This work introduces SQAP-VLA, the first structured, training-free VLA inference acceleration framework that simultaneously enables state-of-the-art quantization and token pruning. We overcome the incompatibility by co-designing the quantization and token pruning pipeline, where we propose new quantization-aware token pruning criteria that work on an aggressively quantized model while improving the quantizer design to enhance pruning effectiveness. When applied to standard VLA models, SQAP-VLA yields significant gains in computational efficiency and inference speed while successfully preserving core model performance, achieving a $\times$1.93 speedup and up to a 4.5\% average success rate enhancement compared to the original model.

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

CV and Pattern Recognition

Makes robots act faster and smarter.

15 Jun 2025 0

91%

SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning

CV and Pattern Recognition

Makes robots learn and act faster.

6 Sep 2025 0

91%

EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

CV and Pattern Recognition

Makes robots learn tasks much faster.

11 Jun 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇺🇸 United States, China

Repos / Data Links

github.com

Page Count

12 pages

SQAP-VLA: A Synergistic Quantization-Aware Pruning Framework for High-Performance Vision-Language-Action Models

Makes smart robots faster and use less power.

Technical Abstract

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning

EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models