Score: 0

O$^3$Afford: One-Shot 3D Object-to-Object Affordance Grounding for Generalizable Robotic Manipulation

Published: September 7, 2025 | arXiv ID: 2509.06233v1

By: Tongxuan Tian, Xuhui Kang, Yen-Ling Kuo

Potential Business Impact:

Robots learn to use objects together better.

Business Areas:

3D Printing Manufacturing

Grounding object affordance is fundamental to robotic manipulation as it establishes the critical link between perception and action among interacting objects. However, prior works predominantly focus on predicting single-object affordance, overlooking the fact that most real-world interactions involve relationships between pairs of objects. In this work, we address the challenge of object-to-object affordance grounding under limited data contraints. Inspired by recent advances in few-shot learning with 2D vision foundation models, we propose a novel one-shot 3D object-to-object affordance learning approach for robotic manipulation. Semantic features from vision foundation models combined with point cloud representation for geometric understanding enable our one-shot learning pipeline to generalize effectively to novel objects and categories. We further integrate our 3D affordance representation with large language models (LLMs) for robotics manipulation, significantly enhancing LLMs' capability to comprehend and reason about object interactions when generating task-specific constraint functions. Our experiments on 3D object-to-object affordance grounding and robotic manipulation demonstrate that our O$^3$Afford significantly outperforms existing baselines in terms of both accuracy and generalization capability.

Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions

CV and Pattern Recognition

Robots learn to grab things from words.

7 Apr 2025 0

90%

Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

Robotics

Teaches robots how to use different objects.

8 Aug 2025 1

90%

Object Affordance Recognition and Grounding via Multi-scale Cross-modal Representation Learning

CV and Pattern Recognition

Teaches robots to grasp and use objects.

2 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

21 pages

O$^3$Afford: One-Shot 3D Object-to-Object Affordance Grounding for Generalizable Robotic Manipulation

Robots learn to use objects together better.

Technical Abstract

Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions

Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

Object Affordance Recognition and Grounding via Multi-scale Cross-modal Representation Learning