Score: 0

Signals vs. Videos: Advancing Motion Intention Recognition for Human-Robot Collaboration in Construction

Published: August 25, 2025 | arXiv ID: 2509.07990v1

By: Charan Gajjala Chenchu , Kinam Kim , Gao Lu and more

Potential Business Impact:

Helps robots understand worker movements faster.

Business Areas:
Image Recognition Data and Analytics, Software

Human-robot collaboration (HRC) in the construction industry depends on precise and prompt recognition of human motion intentions and actions by robots to maximize safety and workflow efficiency. There is a research gap in comparing data modalities, specifically signals and videos, for motion intention recognition. To address this, the study leverages deep learning to assess two different modalities in recognizing workers' motion intention at the early stage of movement in drywall installation tasks. The Convolutional Neural Network - Long Short-Term Memory (CNN-LSTM) model utilizing surface electromyography (sEMG) data achieved an accuracy of around 87% with an average time of 0.04 seconds to perform prediction on a sample input. Meanwhile, the pre-trained Video Swin Transformer combined with transfer learning harnessed video sequences as input to recognize motion intention and attained an accuracy of 94% but with a longer average time of 0.15 seconds for a similar prediction. This study emphasizes the unique strengths and trade-offs of both data formats, directing their systematic deployments to enhance HRC in real-world construction projects.

Country of Origin
🇺🇸 United States

Page Count
8 pages

Category
Electrical Engineering and Systems Science:
Signal Processing