Cross-view Localization and Synthesis -- Datasets, Challenges and Opportunities
By: Ningli Xu, Rongjun Qin
Potential Business Impact:
Shows where a picture was taken from above.
Cross-view localization and synthesis are two fundamental tasks in cross-view visual understanding, which deals with cross-view datasets: overhead (satellite or aerial) and ground-level imagery. These tasks have gained increasing attention due to their broad applications in autonomous navigation, urban planning, and augmented reality. Cross-view localization aims to estimate the geographic position of ground-level images based on information provided by overhead imagery while cross-view synthesis seeks to generate ground-level images based on information from the overhead imagery. Both tasks remain challenging due to significant differences in viewing perspective, resolution, and occlusion, which are widely embedded in cross-view datasets. Recent years have witnessed rapid progress driven by the availability of large-scale datasets and novel approaches. Typically, cross-view localization is formulated as an image retrieval problem where ground-level features are matched with tiled overhead images feature, extracted by convolutional neural networks (CNNs) or vision transformers (ViTs) for cross-view feature embedding. Cross-view synthesis, on the other hand, seeks to generate ground-level views based on information from overhead imagery, generally using generative adversarial networks (GANs) or diffusion models. This paper presents a comprehensive survey of advances in cross-view localization and synthesis, reviewing widely used datasets, highlighting key challenges, and providing an organized overview of state-of-the-art techniques. Furthermore, it discusses current limitations, offers comparative analyses, and outlines promising directions for future research. We also include the project page via https://github.com/GDAOSU/Awesome-Cross-View-Methods.
Similar Papers
Revisiting Cross-View Localization from Image Matching
CV and Pattern Recognition
Helps cameras find places from the sky.
Robust Cross-View Geo-Localization via Content-Viewpoint Disentanglement
CV and Pattern Recognition
Find places on Earth using different pictures.
SkyLink: Unifying Street-Satellite Geo-Localization via UAV-Mediated 3D Scene Alignment
CV and Pattern Recognition
Find places from different pictures.