COMETH: Convex Optimization for Multiview Estimation and Tracking of Humans
By: Enrico Martini , Ho Jin Choi , Nadia Figueroa and more
Potential Business Impact:
Tracks people's movements accurately and cheaply.
In the era of Industry 5.0, monitoring human activity is essential for ensuring both ergonomic safety and overall well-being. While multi-camera centralized setups improve pose estimation accuracy, they often suffer from high computational costs and bandwidth requirements, limiting scalability and real-time applicability. Distributing processing across edge devices can reduce network bandwidth and computational load. On the other hand, the constrained resources of edge devices lead to accuracy degradation, and the distribution of computation leads to temporal and spatial inconsistencies. We address this challenge by proposing COMETH (Convex Optimization for Multiview Estimation and Tracking of Humans), a lightweight algorithm for real-time multi-view human pose fusion that relies on three concepts: it integrates kinematic and biomechanical constraints to increase the joint positioning accuracy; it employs convex optimization-based inverse kinematics for spatial fusion; and it implements a state observer to improve temporal consistency. We evaluate COMETH on both public and industrial datasets, where it outperforms state-of-the-art methods in localization, detection, and tracking accuracy. The proposed fusion pipeline enables accurate and scalable human motion tracking, making it well-suited for industrial and safety-critical applications. The code is publicly available at https://github.com/PARCO-LAB/COMETH.
Similar Papers
CoMotion: Concurrent Multi-person 3D Motion
CV and Pattern Recognition
Tracks many people's body movements in 3D.
Towards Metric-Aware Multi-Person Mesh Recovery by Jointly Optimizing Human Crowd in Camera Space
CV and Pattern Recognition
Makes 3D people in pictures stand correctly.
EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer
CV and Pattern Recognition
Makes videos of people move more realistically.