Score: 1

MAP-VLA: Memory-Augmented Prompting for Vision-Language-Action Model in Robotic Manipulation

Published: November 12, 2025 | arXiv ID: 2511.09516v1

By: Runhao Li , Wenkai Guo , Zhenyu Wu and more

Potential Business Impact:

Helps robots remember and do long, tricky jobs.

Business Areas:

Navigation Navigation and Mapping

Pre-trained Vision-Language-Action (VLA) models have achieved remarkable success in improving robustness and generalization for end-to-end robotic manipulation. However, these models struggle with long-horizon tasks due to their lack of memory and reliance solely on immediate sensory inputs. To address this limitation, we propose Memory-Augmented Prompting for Vision-Language-Action model (MAP-VLA), a novel framework that empowers pre-trained VLA models with demonstration-derived memory prompts to augment action generation for long-horizon robotic manipulation tasks. To achieve this, MAP-VLA first constructs a memory library from historical demonstrations, where each memory unit captures information about a specific stage of a task. These memory units are implemented as learnable soft prompts optimized through prompt tuning. Then, during real-time task execution, MAP-VLA retrieves relevant memory through trajectory similarity matching and dynamically integrates it into the VLA model for augmented action generation. Importantly, this prompt tuning and retrieval augmentation approach operates as a plug-and-play module for a frozen VLA model, offering a lightweight and flexible solution to improve task performance. Experimental results show that MAP-VLA delivers up to 7.0% absolute performance gains in the simulation benchmark and 25.0% on real robot evaluations for long-horizon tasks, surpassing the current state-of-the-art methods.

MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation

Robotics

Robots remember past actions to do harder tasks.

26 Aug 2025 1

93%

EchoVLA: Robotic Vision-Language-Action Model with Synergistic Declarative Memory for Mobile Manipulation

Robotics

Helps robots remember and do tasks across rooms.

22 Nov 2025 0

92%

Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey

Robotics

Makes robots understand and do tasks faster.

20 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇸🇬 Singapore

Page Count

9 pages

MAP-VLA: Memory-Augmented Prompting for Vision-Language-Action Model in Robotic Manipulation

Helps robots remember and do long, tricky jobs.

Technical Abstract

MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation

EchoVLA: Robotic Vision-Language-Action Model with Synergistic Declarative Memory for Mobile Manipulation

Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey