Score: 0

AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making

Published: June 14, 2025 | arXiv ID: 2506.12374v2

By: Wenbo Li , Shiyi Wang , Yiteng Chen and more

Potential Business Impact:

Robots learn new tasks without practice.

Business Areas:

Autonomous Vehicles Transportation

Vision-Language Models (VLMs) encode knowledge and reasoning capabilities for robotic manipulation within high-dimensional representation spaces. However, current approaches often project them into compressed intermediate representations, discarding important task-specific information such as fine-grained spatial or semantic details. To address this, we propose AntiGrounding, a new framework that reverses the instruction grounding process. It lifts candidate actions directly into the VLM representation space, renders trajectories from multiple views, and uses structured visual question answering for instruction-based decision making. This enables zero-shot synthesis of optimal closed-loop robot trajectories for new tasks. We also propose an offline policy refinement module that leverages past experience to enhance long-term performance. Experiments in both simulation and real-world environments show that our method outperforms baselines across diverse robotic manipulation tasks.

Perceiving, Reasoning, Adapting: A Dual-Layer Framework for VLM-Guided Precision Robotic Manipulation

Robotics

Robots learn to do tricky jobs with speed and accuracy.

7 Mar 2025 0

91%

Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System

Robotics

Robots work together better using AI to move things.

5 Jun 2025 1

90%

Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills

Robotics

Robots learn complex tasks without human help.

8 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

33 pages

AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making

Robots learn new tasks without practice.

Technical Abstract

Perceiving, Reasoning, Adapting: A Dual-Layer Framework for VLM-Guided Precision Robotic Manipulation

Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System

Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills