PersPose: 3D Human Pose Estimation with Perspective Encoding and Perspective Rotation
By: Xiaoyang Hao, Han Li
Potential Business Impact:
Helps computers guess body positions from pictures.
Monocular 3D human pose estimation (HPE) methods estimate the 3D positions of joints from individual images. Existing 3D HPE approaches often use the cropped image alone as input for their models. However, the relative depths of joints cannot be accurately estimated from cropped images without the corresponding camera intrinsics, which determine the perspective relationship between 3D objects and the cropped images. In this work, we introduce Perspective Encoding (PE) to encode the camera intrinsics of the cropped images. Moreover, since the human subject can appear anywhere within the original image, the perspective relationship between the 3D scene and the cropped image differs significantly, which complicates model fitting. Additionally, the further the human subject deviates from the image center, the greater the perspective distortions in the cropped image. To address these issues, we propose Perspective Rotation (PR), a transformation applied to the original image that centers the human subject, thereby reducing perspective distortions and alleviating the difficulty of model fitting. By incorporating PE and PR, we propose a novel 3D HPE framework, PersPose. Experimental results demonstrate that PersPose achieves state-of-the-art (SOTA) performance on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets. For example, on the in-the-wild dataset 3DPW, PersPose achieves an MPJPE of 60.1 mm, 7.54% lower than the previous SOTA approach. Code is available at: https://github.com/KenAdamsJoseph/PersPose.
Similar Papers
UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning
CV and Pattern Recognition
Makes computers understand human poses from different views.
Physics Informed Human Posture Estimation Based on 3D Landmarks from Monocular RGB-Videos
CV and Pattern Recognition
Makes exercise apps understand your body better.
ViPE: Video Pose Engine for 3D Geometric Perception
CV and Pattern Recognition
Makes robots understand 3D shapes from videos.