Beyond Motion Pattern: An Empirical Study of Physical Forces for Human Motion Understanding
By: Anh Dao , Manh Tran , Yufei Zhang and more
Human motion understanding has advanced rapidly through vision-based progress in recognition, tracking, and captioning. However, most existing methods overlook physical cues such as joint actuation forces that are fundamental in biomechanics. This gap motivates our study: if and when do physically inferred forces enhance motion understanding? By incorporating forces into established motion understanding pipelines, we systematically evaluate their impact across baseline models on 3 major tasks: gait recognition, action recognition, and fine-grained video captioning. Across 8 benchmarks, incorporating forces yields consistent performance gains; for example, on CASIA-B, Rank-1 gait recognition accuracy improved from 89.52% to 90.39% (+0.87), with larger gain observed under challenging conditions: +2.7% when wearing a coat and +3.0% at the side view. On Gait3D, performance also increases from 46.0% to 47.3% (+1.3). In action recognition, CTR-GCN achieved +2.00% on Penn Action, while high-exertion classes like punching/slapping improved by +6.96%. Even in video captioning, Qwen2.5-VL's ROUGE-L score rose from 0.310 to 0.339 (+0.029), indicating that physics-inferred forces enhance temporal grounding and semantic richness. These results demonstrate that force cues can substantially complement visual and kinematic features under dynamic, occluded, or appearance-varying conditions.
Similar Papers
Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition
CV and Pattern Recognition
Teaches robots to understand actions by watching.
Breaking the Passive Learning Trap: An Active Perception Strategy for Human Motion Prediction
CV and Pattern Recognition
Teaches computers to predict human movements better.
Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos
CV and Pattern Recognition
Checks if fake human videos look real.