DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies
By: Renke Wang , Zhenyu Zhang , Ying Tai and more
Potential Business Impact:
Makes computer models create realistic 3D people from pictures.
Human mesh recovery from multi-view images faces a fundamental challenge: real-world datasets contain imperfect ground-truth annotations that bias the models' training, while synthetic data with precise supervision suffers from domain gap. In this paper, we propose DiffProxy, a novel framework that generates multi-view consistent human proxies for mesh recovery. Central to DiffProxy is leveraging the diffusion-based generative priors to bridge the synthetic training and real-world generalization. Its key innovations include: (1) a multi-conditional mechanism for generating multi-view consistent, pixel-aligned human proxies; (2) a hand refinement module that incorporates flexible visual prompts to enhance local details; and (3) an uncertainty-aware test-time scaling method that increases robustness to challenging cases during optimization. These designs ensure that the mesh recovery process effectively benefits from the precise synthetic ground truth and generative advantages of the diffusion-based pipeline. Trained entirely on synthetic data, DiffProxy achieves state-of-the-art performance across five real-world benchmarks, demonstrating strong zero-shot generalization particularly on challenging scenarios with occlusions and partial views. Project page: https://wrk226.github.io/DiffProxy.html
Similar Papers
Bringing Your Portrait to 3D Presence
CV and Pattern Recognition
Turns one photo into a moving 3D person.
HumanGif: Single-View Human Diffusion with Generative Prior
CV and Pattern Recognition
Creates realistic 3D people from one picture.
PMMD: A pose-guided multi-view multi-modal diffusion for person generation
CV and Pattern Recognition
Creates realistic people pictures from different views.