Enhancing XR Auditory Realism via Multimodal Scene-Aware Acoustic Rendering
By: Tianyu Xu , Jihan Li , Penghe Zu and more
Potential Business Impact:
Makes virtual sounds feel real in any space.
In Extended Reality (XR), rendering sound that accurately simulates real-world acoustics is pivotal in creating lifelike and believable virtual experiences. However, existing XR spatial audio rendering methods often struggle with real-time adaptation to diverse physical scenes, causing a sensory mismatch between visual and auditory cues that disrupts user immersion. To address this, we introduce SAMOSA, a novel on-device system that renders spatially accurate sound by dynamically adapting to its physical environment. SAMOSA leverages a synergistic multimodal scene representation by fusing real-time estimations of room geometry, surface materials, and semantic-driven acoustic context. This rich representation then enables efficient acoustic calibration via scene priors, allowing the system to synthesize a highly realistic Room Impulse Response (RIR). We validate our system through technical evaluation using acoustic metrics for RIR synthesis across various room configurations and sound types, alongside an expert evaluation (N=12). Evaluation results demonstrate SAMOSA's feasibility and efficacy in enhancing XR auditory realism.
Similar Papers
Hearing Anywhere in Any Environment
CV and Pattern Recognition
Makes virtual sounds feel real in any room.
Real-Time Auralization for First-Person Vocal Interaction in Immersive Virtual Environments
Audio and Speech Processing
Makes virtual reality sound like real places.
Sonify Anything: Towards Context-Aware Sonic Interactions in AR
Human-Computer Interaction
Makes virtual things sound real when they touch.