UniView: Enhancing Novel View Synthesis From A Single Image By Unifying Reference Features
By: Haowang Cui , Rui Chen , Tao Luo and more
Potential Business Impact:
Creates new pictures of objects from one photo.
The task of synthesizing novel views from a single image is highly ill-posed due to multiple explanations for unobserved areas. Most current methods tend to generate unseen regions from ambiguity priors and interpolation near input views, which often lead to severe distortions. To address this limitation, we propose a novel model dubbed as UniView, which can leverage reference images from a similar object to provide strong prior information during view synthesis. More specifically, we construct a retrieval and augmentation system and employ a multimodal large language model (MLLM) to assist in selecting reference images that meet our requirements. Additionally, a plug-and-play adapter module with multi-level isolation layers is introduced to dynamically generate reference features for the target views. Moreover, in order to preserve the details of an original input image, we design a decoupled triple attention mechanism, which can effectively align and integrate multi-branch features into the synthesis process. Extensive experiments have demonstrated that our UniView significantly improves novel view synthesis performance and outperforms state-of-the-art methods on the challenging datasets.
Similar Papers
UniVideo: Unified Understanding, Generation, and Editing for Videos
CV and Pattern Recognition
Makes videos from words, pictures, and edits them.
UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
CV and Pattern Recognition
Makes computers see and create pictures from words.
ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction
CV and Pattern Recognition
Makes AI pictures better by using more computer "eyes."