One-Shot Refiner: Boosting Feed-forward Novel View Synthesis via One-Step Diffusion
By: Yitong Dong , Qi Zhang , Minchao Jiang and more
Potential Business Impact:
Makes blurry pictures sharp and clear.
We present a novel framework for high-fidelity novel view synthesis (NVS) from sparse images, addressing key limitations in recent feed-forward 3D Gaussian Splatting (3DGS) methods built on Vision Transformer (ViT) backbones. While ViT-based pipelines offer strong geometric priors, they are often constrained by low-resolution inputs due to computational costs. Moreover, existing generative enhancement methods tend to be 3D-agnostic, resulting in inconsistent structures across views, especially in unseen regions. To overcome these challenges, we design a Dual-Domain Detail Perception Module, which enables handling high-resolution images without being limited by the ViT backbone, and endows Gaussians with additional features to store high-frequency details. We develop a feature-guided diffusion network, which can preserve high-frequency details during the restoration process. We introduce a unified training strategy that enables joint optimization of the ViT-based geometric backbone and the diffusion-based refinement module. Experiments demonstrate that our method can maintain superior generation quality across multiple datasets.
Similar Papers
DT-NVS: Diffusion Transformers for Novel View Synthesis
CV and Pattern Recognition
Creates new pictures of a scene from one photo.
Sphinx: Efficiently Serving Novel View Synthesis using Regression-Guided Selective Refinement
CV and Pattern Recognition
Makes 3D scenes look real, super fast.
Learning High-Quality Initial Noise for Single-View Synthesis with Diffusion Models
CV and Pattern Recognition
Makes pictures look better from different angles.