Spatial Audio Rendering for Real-Time Speech Translation in Virtual Meetings
By: Margarita Geleta, Hong Sodoma, Hannes Gamper
Potential Business Impact:
Makes virtual meetings easier to understand for everyone.
Language barriers in virtual meetings remain a persistent challenge to global collaboration. Real-time translation offers promise, yet current integrations often neglect perceptual cues. This study investigates how spatial audio rendering of translated speech influences comprehension, cognitive load, and user experience in multilingual meetings. We conducted a within-subjects experiment with 8 bilingual confederates and 47 participants simulating global team meetings with English translations of Greek, Kannada, Mandarin Chinese, and Ukrainian - languages selected for their diversity in grammar, script, and resource availability. Participants experienced four audio conditions: spatial audio with and without background reverberation, and two non-spatial configurations (diotic, monaural). We measured listener comprehension accuracy, workload ratings, satisfaction scores, and qualitative feedback. Spatially-rendered translations doubled comprehension compared to non-spatial audio. Participants reported greater clarity and engagement when spatial cues and voice timbre differentiation were present. We discuss design implications for integrating real-time translation into meeting platforms, advancing inclusive, cross-language communication in telepresence systems.
Similar Papers
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Computation and Language
Hearables translate languages, keeping voices and directions clear.
Real-Time Auralization for First-Person Vocal Interaction in Immersive Virtual Environments
Audio and Speech Processing
Makes virtual reality sound like real places.
In-the-wild Audio Spatialization with Flexible Text-guided Localization
Sound
Makes game sounds move with your head.