COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation
By: Tony Danjun Wang , Tolga Birdal , Nassir Navab and more
3D pose estimation from sparse multi-views is a critical task for numerous applications, including action recognition, sports analysis, and human-robot interaction. Optimization-based methods typically follow a two-stage pipeline, first detecting 2D keypoints in each view and then associating these detections across views to triangulate the 3D pose. Existing methods rely on mere pairwise associations to model this correspondence problem, treating global consistency between views (i.e., cycle consistency) as a soft constraint. Yet, reconciling these constraints for multiple views becomes brittle when spurious associations propagate errors. We thus propose COMPOSE, a novel framework that formulates multi-view pose correspondence matching as a hypergraph partitioning problem rather than through pairwise association. While the complexity of the resulting integer linear program grows exponentially in theory, we introduce an efficient geometric pruning strategy to substantially reduce the search space. COMPOSE achieves improvements of up to 23% in average precision over previous optimization-based methods and up to 11% over self-supervised end-to-end learned methods, offering a promising solution to a widely studied problem.
Similar Papers
COMETH: Convex Optimization for Multiview Estimation and Tracking of Humans
CV and Pattern Recognition
Tracks people's movements accurately and cheaply.
Point2Pose: A Generative Framework for 3D Human Pose Estimation with Multi-View Point Cloud Dataset
CV and Pattern Recognition
Helps computers understand how people move in 3D.
AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment
CV and Pattern Recognition
Helps robots see objects from many angles.