Affostruction: 3D Affordance Grounding with Generative Reconstruction
By: Chunghyun Park, Seunghyeon Lee, Minsu Cho
This paper addresses the problem of affordance grounding from RGBD images of an object, which aims to localize surface regions corresponding to a text query that describes an action on the object. While existing methods predict affordance regions only on visible surfaces, we propose Affostruction, a generative framework that reconstructs complete geometry from partial observations and grounds affordances on the full shape including unobserved regions. We make three core contributions: generative multi-view reconstruction via sparse voxel fusion that extrapolates unseen geometry while maintaining constant token complexity, flow-based affordance grounding that captures inherent ambiguity in affordance distributions, and affordance-driven active view selection that leverages predicted affordances for intelligent viewpoint sampling. Affostruction achieves 19.1 aIoU on affordance grounding (40.4\% improvement) and 32.67 IoU for 3D reconstruction (67.7\% improvement), enabling accurate affordance prediction on complete shapes.
Similar Papers
Object Affordance Recognition and Grounding via Multi-scale Cross-modal Representation Learning
CV and Pattern Recognition
Teaches robots to grasp and use objects.
O$^3$Afford: One-Shot 3D Object-to-Object Affordance Grounding for Generalizable Robotic Manipulation
Robotics
Robots learn to use objects together better.
DAG: Unleash the Potential of Diffusion Model for Open-Vocabulary 3D Affordance Grounding
CV and Pattern Recognition
Helps robots know where to touch objects.