Score: 0

SAGA: Open-World Mobile Manipulation via Structured Affordance Grounding

Published: December 14, 2025 | arXiv ID: 2512.12842v1

By: Kuan Fang , Yuxin Chen , Xinghao Zhu and more

We present SAGA, a versatile and adaptive framework for visuomotor control that can generalize across various environments, task objectives, and user specifications. To efficiently learn such capability, our key idea is to disentangle high-level semantic intent from low-level visuomotor control by explicitly grounding task objectives in the observed environment. Using an affordance-based task representation, we express diverse and complex behaviors in a unified, structured form. By leveraging multimodal foundation models, SAGA grounds the proposed task representation to the robot's visual observation as 3D affordance heatmaps, highlighting task-relevant entities while abstracting away spurious appearance variations that would hinder generalization. These grounded affordances enable us to effectively train a conditional policy on multi-task demonstration data for whole-body control. In a unified framework, SAGA can solve tasks specified in different forms, including language instructions, selected points, and example demonstrations, enabling both zero-shot execution and few-shot adaptation. We instantiate SAGA on a quadrupedal manipulator and conduct extensive experiments across eleven real-world tasks. SAGA consistently outperforms end-to-end and modular baselines by substantial margins. Together, these results demonstrate that structured affordance grounding offers a scalable and effective pathway toward generalist mobile manipulation.

Scene-agnostic Hierarchical Bimanual Task Planning via Visual Affordance Reasoning

Robotics

Robots use two hands to do tasks better.

10 Dec 2025 0

88%

SAGAS: Semantic-Aware Graph-Assisted Stitching for Offline Temporal Logic Planning

Robotics

Lets robots learn tasks from old videos.

30 Nov 2025 0

87%

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

Robotics

Robot dog follows instructions to move and grab things.

11 Aug 2025 0

View PDF Login to Bookmark

SAGA: Open-World Mobile Manipulation via Structured Affordance Grounding

Technical Abstract

Scene-agnostic Hierarchical Bimanual Task Planning via Visual Affordance Reasoning

SAGAS: Semantic-Aware Graph-Assisted Stitching for Offline Temporal Logic Planning

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks