ArbiViewGen: Controllable Arbitrary Viewpoint Camera Data Generation for Autonomous Driving via Stable Diffusion Models
By: Yatong Lan , Jingfeng Chen , Yiru Wang and more
Potential Business Impact:
Makes self-driving cars see from any angle.
Arbitrary viewpoint image generation holds significant potential for autonomous driving, yet remains a challenging task due to the lack of ground-truth data for extrapolated views, which hampers the training of high-fidelity generative models. In this work, we propose Arbiviewgen, a novel diffusion-based framework for the generation of controllable camera images from arbitrary points of view. To address the absence of ground-truth data in unseen views, we introduce two key components: Feature-Aware Adaptive View Stitching (FAVS) and Cross-View Consistency Self-Supervised Learning (CVC-SSL). FAVS employs a hierarchical matching strategy that first establishes coarse geometric correspondences using camera poses, then performs fine-grained alignment through improved feature matching algorithms, and identifies high-confidence matching regions via clustering analysis. Building upon this, CVC-SSL adopts a self-supervised training paradigm where the model reconstructs the original camera views from the synthesized stitched images using a diffusion model, enforcing cross-view consistency without requiring supervision from extrapolated data. Our framework requires only multi-camera images and their associated poses for training, eliminating the need for additional sensors or depth maps. To our knowledge, Arbiviewgen is the first method capable of controllable arbitrary view camera image generation in multiple vehicle configurations.
Similar Papers
Generative View Stitching
CV and Pattern Recognition
Makes videos avoid crashing into things.
From Satellite to Street: A Hybrid Framework Integrating Stable Diffusion and PanoGAN for Consistent Cross-View Synthesis
CV and Pattern Recognition
Makes street pictures from satellite maps.
FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis
CV and Pattern Recognition
Creates realistic driving videos from any angle.