Score: 0

Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach

Published: September 11, 2025 | arXiv ID: 2509.09067v3

By: Hesham M. Shehata, Mohammad Abdolrahmani

Potential Business Impact:

Helps computers see people using objects.

Business Areas:
Image Recognition Data and Analytics, Software

Recent graph convolutional neural networks (GCNs) have shown high performance in the field of human action recognition by using human skeleton poses. However, it fails to detect human-object interaction cases successfully due to the lack of effective representation of the scene information and appropriate learning architectures. In this context, we propose a methodology to utilize human action recognition performance by considering fixed object information in the environment and following a multi-task learning approach. In order to evaluate the proposed method, we collected real data from public environments and prepared our data set, which includes interaction classes of hands-on fixed objects (e.g., ATM ticketing machines, check-in/out machines, etc.) and non-interaction classes of walking and standing. The multi-task learning approach, along with interaction area information, succeeds in recognizing the studied interaction and non-interaction actions with an accuracy of 99.25%, outperforming the accuracy of the base model using only human skeleton poses by 2.75%.

Page Count
11 pages

Category
Computer Science:
CV and Pattern Recognition