Score: 0

An Image-like Diffusion Method for Human-Object Interaction Detection

Published: March 23, 2025 | arXiv ID: 2503.18134v1

By: Xiaofei Hui , Haoxuan Qu , Hossein Rahmani and more

Potential Business Impact:

Teaches computers to see people doing things.

Business Areas:

Human Computer Interaction Design, Science and Engineering

Human-object interaction (HOI) detection often faces high levels of ambiguity and indeterminacy, as the same interaction can appear vastly different across different human-object pairs. Additionally, the indeterminacy can be further exacerbated by issues such as occlusions and cluttered backgrounds. To handle such a challenging task, in this work, we begin with a key observation: the output of HOI detection for each human-object pair can be recast as an image. Thus, inspired by the strong image generation capabilities of image diffusion models, we propose a new framework, HOI-IDiff. In HOI-IDiff, we tackle HOI detection from a novel perspective, using an Image-like Diffusion process to generate HOI detection outputs as images. Furthermore, recognizing that our recast images differ in certain properties from natural images, we enhance our framework with a customized HOI diffusion process and a slice patchification model architecture, which are specifically tailored to generate our recast ``HOI images''. Extensive experiments demonstrate the efficacy of our framework.

iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer

Graphics

Makes fake people realistically grab and use things.

15 Jun 2025 1

90%

Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors

Graphics

Creates realistic 3D actions from text descriptions.

25 Mar 2025 0

90%

GenHOI: Generalizing Text-driven 4D Human-Object Interaction Synthesis for Unseen Objects

CV and Pattern Recognition

Creates realistic human-object actions for computers.

18 Jun 2025 1

View PDF Login to Bookmark

Page Count

11 pages

An Image-like Diffusion Method for Human-Object Interaction Detection

Teaches computers to see people doing things.

Technical Abstract

iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer

Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors

GenHOI: Generalizing Text-driven 4D Human-Object Interaction Synthesis for Unseen Objects