BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models
By: Xiaoyu Ma , Zhengqing Yuan , Zheyuan Zhang and more
Vision-language-action (VLA) models enable impressive zero shot manipulation, but their inference stacks are often too heavy for responsive web demos or high frequency robot control on commodity GPUs. We present BLURR, a lightweight inference wrapper that can be plugged into existing VLA controllers without retraining or changing model checkpoints. Instantiated on the pi-zero VLA controller, BLURR keeps the original observation interfaces and accelerates control by combining an instruction prefix key value cache, mixed precision execution, and a single step rollout schedule that reduces per step computation. In our SimplerEnv based evaluation, BLURR maintains task success rates comparable to the original controller while significantly lowering effective FLOPs and wall clock latency. We also build an interactive web demo that allows users to switch between controllers and toggle inference options in real time while watching manipulation episodes. This highlights BLURR as a practical approach for deploying modern VLA policies under tight compute budgets.
Similar Papers
Lite VLA: Efficient Vision-Language-Action Control on CPU-Bound Edge Robots
Robotics
Robots see and think without internet.
Cross-Platform Scaling of Vision-Language-Action Models from Edge to Cloud GPUs
Artificial Intelligence
Robots learn tasks better on less power.
VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation
Robotics
Helps robots learn to grab new things.