Building Audio-Visual Digital Twins with Smartphones
By: Zitong Lan , Yiwei Tang , Yuhan Wang and more
Potential Business Impact:
Creates digital twins that hear and see.
Digital twins today are almost entirely visual, overlooking acoustics-a core component of spatial realism and interaction. We introduce AV-Twin, the first practical system that constructs editable audio-visual digital twins using only commodity smartphones. AV-Twin combines mobile RIR capture and a visual-assisted acoustic field model to efficiently reconstruct room acoustics. It further recovers per-surface material properties through differentiable acoustic rendering, enabling users to modify materials, geometry, and layout while automatically updating both audio and visuals. Together, these capabilities establish a practical path toward fully modifiable audio-visual digital twins for real-world environments.
Similar Papers
Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound
Multimedia
Robots learn to see and hear to navigate better.
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
CV and Pattern Recognition
Makes virtual worlds sound real with less data.
Enhancing XR Auditory Realism via Multimodal Scene-Aware Acoustic Rendering
Human-Computer Interaction
Makes virtual sounds feel real in any space.