Score: 2

Lightweight and Accurate Multi-View Stereo with Confidence-Aware Diffusion Model

Published: September 18, 2025 | arXiv ID: 2509.15220v1

By: Fangjinhua Wang , Qingshan Xu , Yew-Soon Ong and more

Potential Business Impact:

Creates 3D shapes from pictures faster.

Business Areas:

Image Recognition Data and Analytics, Software

To reconstruct the 3D geometry from calibrated images, learning-based multi-view stereo (MVS) methods typically perform multi-view depth estimation and then fuse depth maps into a mesh or point cloud. To improve the computational efficiency, many methods initialize a coarse depth map and then gradually refine it in higher resolutions. Recently, diffusion models achieve great success in generation tasks. Starting from a random noise, diffusion models gradually recover the sample with an iterative denoising process. In this paper, we propose a novel MVS framework, which introduces diffusion models in MVS. Specifically, we formulate depth refinement as a conditional diffusion process. Considering the discriminative characteristic of depth estimation, we design a condition encoder to guide the diffusion process. To improve efficiency, we propose a novel diffusion network combining lightweight 2D U-Net and convolutional GRU. Moreover, we propose a novel confidence-based sampling strategy to adaptively sample depth hypotheses based on the confidence estimated by diffusion model. Based on our novel MVS framework, we propose two novel MVS methods, DiffMVS and CasDiffMVS. DiffMVS achieves competitive performance with state-of-the-art efficiency in run-time and GPU memory. CasDiffMVS achieves state-of-the-art performance on DTU, Tanks & Temples and ETH3D. Code is available at: https://github.com/cvg/diffmvs.

GeoDiff: Geometry-Guided Diffusion for Metric Depth Estimation

CV and Pattern Recognition

Makes single-camera pictures show true distances.

21 Oct 2025 1

90%

DMS:Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation

CV and Pattern Recognition

Makes 3D pictures from two photos better.

18 Aug 2025 2

90%

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

CV and Pattern Recognition

Creates realistic 3D rooms from simple drawings.

3 Dec 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

15 pages

Lightweight and Accurate Multi-View Stereo with Confidence-Aware Diffusion Model

Creates 3D shapes from pictures faster.

Technical Abstract

GeoDiff: Geometry-Guided Diffusion for Metric Depth Estimation

DMS:Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models