Score: 0

Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory

Published: July 22, 2025 | arXiv ID: 2507.16713v1

By: Guowei Lan , Kaixian Qu , René Zurbrügg and more

Potential Business Impact:

Robots learn from mistakes to do tasks better.

Vision-language models (VLMs) have been widely adopted in robotics to enable autonomous planning. However, grounding VLMs, originally trained on internet data, to diverse real-world robots remains a challenge. This paper presents ExpTeach, a framework that grounds VLMs to physical robots by building a self-generated memory of real-world experiences. In ExpTeach, the VLM autonomously plans actions, verifies outcomes, reflects on failures, and adapts robot behaviors in a closed loop. The self-generated experiences during this process are then summarized into a long-term memory, enabling retrieval of learned knowledge to guide future tasks via retrieval-augmented generation (RAG). Additionally, ExpTeach enhances the spatial understanding of VLMs with an on-demand image annotation module. In experiments, we show that reflection improves success rates from 36% to 84% on four challenging robotic tasks and observe the emergence of intelligent object interactions, including creative tool use. Across extensive tests on 12 real-world scenarios (including eight unseen ones), we find that grounding with long-term memory boosts single-trial success rates from 22% to 80%, demonstrating the effectiveness and generalizability of ExpTeach.

ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models

Robotics

Robots learn to explore and do tasks better.

16 Aug 2025 0

91%

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Robotics

Helps robots plan and fix mistakes.

23 Feb 2025 1

90%

ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval

Robotics

Makes robots learn new jobs faster and better.

9 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇭 Switzerland

Page Count

27 pages

Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory

Robots learn from mistakes to do tasks better.

Technical Abstract

ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval