R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors
By: Haoyang Wang , Liming Liu , Peiheng Wang and more
Potential Business Impact:
Builds 3D shapes from few pictures.
Mesh reconstruction from multi-view images is a fundamental problem in computer vision, but its performance degrades significantly under sparse-view conditions, especially in unseen regions where no ground-truth observations are available. While recent advances in diffusion models have demonstrated strong capabilities in synthesizing novel views from limited inputs, their outputs often suffer from visual artifacts and lack 3D consistency, posing challenges for reliable mesh optimization. In this paper, we propose a novel framework that leverages diffusion models to enhance sparse-view mesh reconstruction in a principled and reliable manner. To address the instability of diffusion outputs, we propose a Consensus Diffusion Module that filters unreliable generations via interquartile range (IQR) analysis and performs variance-aware image fusion to produce robust pseudo-supervision. Building on this, we design an online reinforcement learning strategy based on the Upper Confidence Bound (UCB) to adaptively select the most informative viewpoints for enhancement, guided by diffusion loss. Finally, the fused images are used to jointly supervise a NeRF-based model alongside sparse-view ground truth, ensuring consistency across both geometry and appearance. Extensive experiments demonstrate that our method achieves significant improvements in both geometric quality and rendering quality.
Similar Papers
Uncertainty-Aware Diffusion Guided Refinement of 3D Scenes
CV and Pattern Recognition
Creates realistic 3D worlds from one picture.
SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model
CV and Pattern Recognition
Cleans up blurry pictures using many photos.
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
CV and Pattern Recognition
Makes AI pictures match words better.