Deep Learning for Personalized Binaural Audio Reproduction
By: Xikun Lu , Yunda Chen , Zehua Chen and more
Potential Business Impact:
Makes headphones sound like real life.
Personalized binaural audio reproduction is the basis of realistic spatial localization, sound externalization, and immersive listening, directly shaping user experience and listening effort. This survey reviews recent advances in deep learning for this task and organizes them by generation mechanism into two paradigms: explicit personalized filtering and end-to-end rendering. Explicit methods predict personalized head-related transfer functions (HRTFs) from sparse measurements, morphological features, or environmental cues, and then use them in the conventional rendering pipeline. End-to-end methods map source signals directly to binaural signals, aided by other inputs such as visual, textual, or parametric guidance, and they learn personalization within the model. We also summarize the field's main datasets and evaluation metrics to support fair and repeatable comparison. Finally, we conclude with a discussion of key applications enabled by these technologies, current technical limitations, and potential research directions for deep learning-based spatial audio systems.
Similar Papers
Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
Sound
Helps computers hear where sounds come from.
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
CV and Pattern Recognition
Makes silent videos sound like you're there.
FoleySpace: Vision-Aligned Binaural Spatial Audio Generation
Sound
Makes videos sound like you're really there.