DOA Estimation with Lightweight Network on LLM-Aided Simulated Acoustic Scenes
By: Haowen Li , Zhengding Luo , Dongyuan Shi and more
Potential Business Impact:
Helps microphones hear sounds from any direction.
Direction-of-Arrival (DOA) estimation is critical in spatial audio and acoustic signal processing, with wide-ranging applications in real-world. Most existing DOA models are trained on synthetic data by convolving clean speech with room impulse responses (RIRs), which limits their generalizability due to constrained acoustic diversity. In this paper, we revisit DOA estimation using a recently introduced dataset constructed with the assistance of large language models (LLMs), which provides more realistic and diverse spatial audio scenes. We benchmark several representative neural-based DOA methods on this dataset and propose LightDOA, a lightweight DOA estimation model based on depthwise separable convolutions, specifically designed for mutil-channel input in varying environments. Experimental results show that LightDOA achieves satisfactory accuracy and robustness across various acoustic scenes while maintaining low computational complexity. This study not only highlights the potential of spatial audio synthesized with the assistance of LLMs in advancing robust and efficient DOA estimation research, but also highlights LightDOA as efficient solution for resource-constrained applications.
Similar Papers
HYPERDOA: Robust and Efficient DoA Estimation using Hyperdimensional Computing
Signal Processing
Finds sound direction better with less power.
Audio-Visual Camera Pose Estimationn with Passive Scene Sounds and In-the-Wild Video
CV and Pattern Recognition
Lets cameras know where they are using sound.
Spatial Audio Processing with Large Language Model on Wearable Devices
Sound
Listens to where sounds come from.