DevilSight: Augmenting Monocular Human Avatar Reconstruction through a Virtual Perspective
By: Yushuo Chen , Ruizhi Shao , Youxin Pang and more
Potential Business Impact:
Makes 3D people models from videos.
We present a novel framework to reconstruct human avatars from monocular videos. Recent approaches have struggled either to capture the fine-grained dynamic details from the input or to generate plausible details at novel viewpoints, which mainly stem from the limited representational capacity of the avatar model and insufficient observational data. To overcome these challenges, we propose to leverage the advanced video generative model, Human4DiT, to generate the human motions from alternative perspective as an additional supervision signal. This approach not only enriches the details in previously unseen regions but also effectively regularizes the avatar representation to mitigate artifacts. Furthermore, we introduce two complementary strategies to enhance video generation: To ensure consistent reproduction of human motion, we inject the physical identity into the model through video fine-tuning. For higher-resolution outputs with finer details, a patch-based denoising algorithm is employed. Experimental results demonstrate that our method outperforms recent state-of-the-art approaches and validate the effectiveness of our proposed strategies.
Similar Papers
MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars
CV and Pattern Recognition
Makes digital people move realistically from one photo.
Bringing Your Portrait to 3D Presence
CV and Pattern Recognition
Turns one photo into a moving 3D person.
HRM^2Avatar: High-Fidelity Real-Time Mobile Avatars from Monocular Phone Scans
Graphics
Makes realistic digital people from phone videos.