Score: 0

Spatial Blind Spot: Auditory Motion Perception Deficits in Audio LLMs

Published: November 17, 2025 | arXiv ID: 2511.13273v1

By: Zhe Sun , Yujun Cai , Jiayu Yao and more

Potential Business Impact:

Computers can't tell where sounds are moving.

Business Areas:
Audio Media and Entertainment, Music and Audio

Large Audio-Language Models (LALMs) have recently shown impressive progress in speech recognition, audio captioning, and auditory question answering. Yet, whether these models can perceive spatial dynamics, particularly the motion of sound sources, remains unclear. In this work, we uncover a systematic motion perception deficit in current ALLMs. To investigate this issue, we introduce AMPBench, the first benchmark explicitly designed to evaluate auditory motion understanding. AMPBench introduces a controlled question-answering benchmark designed to evaluate whether Audio-Language Models (LALMs) can infer the direction and trajectory of moving sound sources from binaural audio. Comprehensive quantitative and qualitative analyses reveal that current models struggle to reliably recognize motion cues or distinguish directional patterns. The average accuracy remains below 50%, underscoring a fundamental limitation in auditory spatial reasoning. Our study highlights a fundamental gap between human and model auditory spatial reasoning, providing both a diagnostic tool and new insight for enhancing spatial cognition in future Audio-Language Models.

Country of Origin
🇦🇺 Australia

Page Count
10 pages

Category
Computer Science:
Sound