Spatial-Temporal Human-Object Interaction Detection
By: Xu Sun , Yunqing He , Tongwei Ren and more
Potential Business Impact:
Helps computers understand what people do in videos.
In this paper, we propose a new instance-level human-object interaction detection task on videos called ST-HOID, which aims to distinguish fine-grained human-object interactions (HOIs) and the trajectories of subjects and objects. It is motivated by the fact that HOI is crucial for human-centric video content understanding. To solve ST-HOID, we propose a novel method consisting of an object trajectory detection module and an interaction reasoning module. Furthermore, we construct the first dataset named VidOR-HOID for ST-HOID evaluation, which contains 10,831 spatial-temporal HOI instances. We conduct extensive experiments to evaluate the effectiveness of our method. The experimental results demonstrate that our method outperforms the baselines generated by the state-of-the-art methods of image human-object interaction detection, video visual relation detection and video human-object interaction recognition.
Similar Papers
Learning Human-Object Interaction as Groups
CV and Pattern Recognition
Helps computers understand group actions, not just pairs.
Learning to Generate Human-Human-Object Interactions from Textual Descriptions
CV and Pattern Recognition
Teaches computers to show people interacting with objects.
UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space
CV and Pattern Recognition
Helps computers understand how people use things.