Score: 1

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

Published: December 3, 2025 | arXiv ID: 2512.04248v1

By: Shaoheng Fang , Chaohui Yu , Fan Wang and more

Potential Business Impact:

Creates realistic 3D rooms from simple drawings.

Business Areas:

Virtual Reality Hardware, Software

We introduce MVRoom, a controllable novel view synthesis (NVS) pipeline for 3D indoor scenes that uses multi-view diffusion conditioned on a coarse 3D layout. MVRoom employs a two-stage design in which the 3D layout is used throughout to enforce multi-view consistency. The first stage employs novel representations to effectively bridge the 3D layout and consistent image-based condition signals for multi-view generation. The second stage performs image-conditioned multi-view generation, incorporating a layout-aware epipolar attention mechanism to enhance multi-view consistency during the diffusion process. Additionally, we introduce an iterative framework that generates 3D scenes with varying numbers of objects and scene complexities by recursively performing multi-view generation (MVRoom), supporting text-to-scene generation. Experimental results demonstrate that our approach achieves high-fidelity and controllable 3D scene generation for NVS, outperforming state-of-the-art baseline methods both quantitatively and qualitatively. Ablation studies further validate the effectiveness of key components within our generation pipeline.

DT-NVS: Diffusion Transformers for Novel View Synthesis

CV and Pattern Recognition

Creates new pictures of a scene from one photo.

11 Nov 2025 3

90%

MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion

CV and Pattern Recognition

Makes pictures look the same from any angle.

15 Oct 2025 0

90%

Lightweight and Accurate Multi-View Stereo with Confidence-Aware Diffusion Model

CV and Pattern Recognition

Creates 3D shapes from pictures faster.

18 Sep 2025 2

View PDF Login to Bookmark

Page Count

15 pages

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

Creates realistic 3D rooms from simple drawings.

Technical Abstract

DT-NVS: Diffusion Transformers for Novel View Synthesis

MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion

Lightweight and Accurate Multi-View Stereo with Confidence-Aware Diffusion Model