Audio Geolocation: A Natural Sounds Benchmark
By: Mustafa Chasmai , Wuao Liu , Subhransu Maji and more
Potential Business Impact:
Find people's location using only sounds.
Can we determine someone's geographic location purely from the sounds they hear? Are acoustic signals enough to localize within a country, state, or even city? We tackle the challenge of global-scale audio geolocation, formalize the problem, and conduct an in-depth analysis with wildlife audio from the iNatSounds dataset. Adopting a vision-inspired approach, we convert audio recordings to spectrograms and benchmark existing image geolocation techniques. We hypothesize that species vocalizations offer strong geolocation cues due to their defined geographic ranges and propose an approach that integrates species range prediction with retrieval-based geolocation. We further evaluate whether geolocation improves when analyzing species-rich recordings or when aggregating across spatiotemporal neighborhoods. Finally, we introduce case studies from movies to explore multimodal geolocation using both audio and visual content. Our work highlights the advantages of integrating audio and visual cues, and sets the stage for future research in audio geolocation.
Similar Papers
The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization
Sound
Helps computers guess where sounds come from.
Cross-Modal Urban Sensing: Evaluating Sound-Vision Alignment Across Street-Level and Aerial Imagery
CV and Pattern Recognition
Maps cities by listening to their sounds.
The iNaturalist Sounds Dataset
Sound
Helps computers identify animal sounds worldwide.