Score: 0

Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach

Published: September 11, 2025 | arXiv ID: 2509.09067v3

By: Hesham M. Shehata, Mohammad Abdolrahmani

Potential Business Impact:

Helps computers see people using objects.

Business Areas:

Image Recognition Data and Analytics, Software

Recent graph convolutional neural networks (GCNs) have shown high performance in the field of human action recognition by using human skeleton poses. However, it fails to detect human-object interaction cases successfully due to the lack of effective representation of the scene information and appropriate learning architectures. In this context, we propose a methodology to utilize human action recognition performance by considering fixed object information in the environment and following a multi-task learning approach. In order to evaluate the proposed method, we collected real data from public environments and prepared our data set, which includes interaction classes of hands-on fixed objects (e.g., ATM ticketing machines, check-in/out machines, etc.) and non-interaction classes of walking and standing. The multi-task learning approach, along with interaction area information, succeeds in recognizing the studied interaction and non-interaction actions with an accuracy of 99.25%, outperforming the accuracy of the base model using only human skeleton poses by 2.75%.

Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach

CV and Pattern Recognition

Helps computers see people using objects.

11 Sep 2025 0

89%

Label-Efficient Skeleton-based Recognition with Stable-Invertible Graph Convolutional Networks

CV and Pattern Recognition

Teaches computers to recognize actions with less data.

21 Nov 2025 0

89%

Active Learning for GCN-based Action Recognition

CV and Pattern Recognition

Teaches computers to recognize actions with less training.

26 Nov 2025 0

View PDF Login to Bookmark

Page Count

11 pages

Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach

Helps computers see people using objects.

Technical Abstract

Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach

Label-Efficient Skeleton-based Recognition with Stable-Invertible Graph Convolutional Networks

Active Learning for GCN-based Action Recognition