Score: 1

Afford-X: Generalizable and Slim Affordance Reasoning for Task-oriented Manipulation

Published: March 5, 2025 | arXiv ID: 2503.03556v2

By: Xiaomeng Zhu , Yuyang Li , Leiyao Cui and more

Potential Business Impact:

Helps robots understand how to use objects.

Business Areas:

Artificial Intelligence Artificial Intelligence, Data and Analytics, Science and Engineering, Software

Object affordance reasoning, the ability to infer object functionalities based on physical properties, is fundamental for task-oriented planning and activities in both humans and Artificial Intelligence (AI). This capability, required for planning and executing daily activities in a task-oriented manner, relies on commonsense knowledge of object physics and functionalities, extending beyond simple object recognition. Current computational models for affordance reasoning from perception lack generalizability, limiting their applicability in novel scenarios. Meanwhile, comprehensive Large Language Models (LLMs) with emerging reasoning capabilities are challenging to deploy on local devices for task-oriented manipulations. Here, we introduce LVIS-Aff, a large-scale dataset comprising 1,496 tasks and 119k images, designed to enhance the generalizability of affordance reasoning from perception. Utilizing this dataset, we develop Afford-X, an end-to-end trainable affordance reasoning model that incorporates Verb Attention and Bi-Fusion modules to improve multi-modal understanding. This model achieves up to a 12.1% performance improvement over the best-reported results from non-LLM methods, while also demonstrating a 1.2% enhancement compared to our previous conference paper. Additionally, it maintains a compact 187M parameter size and infers nearly 50 times faster than the GPT-4V API. Our work demonstrates the potential for efficient, generalizable affordance reasoning models that can be deployed on local devices for task-oriented manipulations. We showcase Afford-X's effectiveness in enabling task-oriented manipulations for robots across various tasks and environments, underscoring its efficiency and broad implications for advancing robotics and AI systems in real-world applications.

AffordGrasp: In-Context Affordance Reasoning for Open-Vocabulary Task-Oriented Grasping in Clutter

Robotics

Robots learn to grab objects from simple instructions.

2 Mar 2025 0

90%

AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models

CV and Pattern Recognition

Helps robots understand how to use objects.

13 Nov 2025 1

90%

RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation

Robotics

Helps robots understand how to grab and move things.

16 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇭🇰 Hong Kong, China

Page Count

28 pages

Afford-X: Generalizable and Slim Affordance Reasoning for Task-oriented Manipulation

Helps robots understand how to use objects.

Technical Abstract

AffordGrasp: In-Context Affordance Reasoning for Open-Vocabulary Task-Oriented Grasping in Clutter

AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models

RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation