Score: 0

MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion

Published: October 15, 2025 | arXiv ID: 2510.13702v1

By: Minjung Shin , Hyunin Cho , Sooyeon Go and more

Potential Business Impact:

Makes pictures look the same from any angle.

Business Areas:

Motion Capture Media and Entertainment, Video

Multi-view generation with camera pose control and prompt-based customization are both essential elements for achieving controllable generative models. However, existing multi-view generation models do not support customization with geometric consistency, whereas customization models lack explicit viewpoint control, making them challenging to unify. Motivated by these gaps, we introduce a novel task, multi-view customization, which aims to jointly achieve multi-view camera pose control and customization. Due to the scarcity of training data in customization, existing multi-view generation models, which inherently rely on large-scale datasets, struggle to generalize to diverse prompts. To address this, we propose MVCustom, a novel diffusion-based framework explicitly designed to achieve both multi-view consistency and customization fidelity. In the training stage, MVCustom learns the subject's identity and geometry using a feature-field representation, incorporating the text-to-video diffusion backbone enhanced with dense spatio-temporal attention, which leverages temporal coherence for multi-view consistency. In the inference stage, we introduce two novel techniques: depth-aware feature rendering explicitly enforces geometric consistency, and consistent-aware latent completion ensures accurate perspective alignment of the customized subject and surrounding backgrounds. Extensive experiments demonstrate that MVCustom is the only framework that simultaneously achieves faithful multi-view generation and customization.

Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures

CV and Pattern Recognition

Makes videos with characters that look the same.

16 Oct 2025 1

90%

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

CV and Pattern Recognition

Creates realistic 3D rooms from simple drawings.

3 Dec 2025 1

89%

CtrlVDiff: Controllable Video Generation via Unified Multimodal Video Diffusion

CV and Pattern Recognition

Makes videos change appearance and content easily.

26 Nov 2025 1

View PDF Login to Bookmark

Page Count

20 pages

MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion

Makes pictures look the same from any angle.

Technical Abstract

Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

CtrlVDiff: Controllable Video Generation via Unified Multimodal Video Diffusion