SatDreamer360: Geometry Consistent Street-View Video Generation from Satellite Imagery
By: Xianghui Ze , Beiyi Zhu , Zhenbo Song and more
Potential Business Impact:
Makes satellite pictures look like real street videos.
Generating continuous ground-level video from satellite imagery is a challenging task with significant potential for applications in simulation, autonomous navigation, and digital twin cities. Existing approaches primarily focus on synthesizing individual ground-view images, often relying on auxiliary inputs like height maps or handcrafted projections, and fall short in producing temporally consistent sequences. In this paper, we propose {SatDreamer360}, a novel framework that generates geometrically and temporally consistent ground-view video from a single satellite image and a predefined trajectory. To bridge the large viewpoint gap, we introduce a compact tri-plane representation that encodes scene geometry directly from the satellite image. A ray-based pixel attention mechanism retrieves view-dependent features from the tri-plane, enabling accurate cross-view correspondence without requiring additional geometric priors. To ensure multi-frame consistency, we propose an epipolar-constrained temporal attention module that aligns features across frames using the known relative poses along the trajectory. To support evaluation, we introduce {VIGOR++}, a large-scale dataset for cross-view video generation, with dense trajectory annotations and high-quality ground-view sequences. Extensive experiments demonstrate that SatDreamer360 achieves superior performance in fidelity, coherence, and geometric alignment across diverse urban scenes.
Similar Papers
Satellite to GroundScape -- Large-scale Consistent Ground View Generation from Satellite Views
CV and Pattern Recognition
Turns bird's-eye views into connected street scenes.
From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images
CV and Pattern Recognition
Creates 3D city views from satellite pictures.
From Satellite to Street: A Hybrid Framework Integrating Stable Diffusion and PanoGAN for Consistent Cross-View Synthesis
CV and Pattern Recognition
Makes street pictures from satellite maps.