Multi-Channel Differential ASR for Robust Wearer Speech Recognition on Smart Glasses
By: Yufeng Yang , Yiteng Huang , Yong Xu and more
Potential Business Impact:
Clears background noise for better voice commands.
With the growing adoption of wearable devices such as smart glasses for AI assistants, wearer speech recognition (WSR) is becoming increasingly critical to next-generation human-computer interfaces. However, in real environments, interference from side-talk speech remains a significant challenge to WSR and may cause accumulated errors for downstream tasks such as natural language processing. In this work, we introduce a novel multi-channel differential automatic speech recognition (ASR) method for robust WSR on smart glasses. The proposed system takes differential inputs from different frontends that complement each other to improve the robustness of WSR, including a beamformer, microphone selection, and a lightweight side-talk detection model. Evaluations on both simulated and real datasets demonstrate that the proposed system outperforms the traditional approach, achieving up to an 18.0% relative reduction in word error rate.
Similar Papers
Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition
Sound
Lets computers understand talking in noisy rooms.
Bridging the Reality Gap: Efficient Adaptation of ASR systems for Challenging Low-Resource Domains
Computation and Language
Makes doctors' notes understandable by computers.
AD-AVSR: Asymmetric Dual-stream Enhancement for Robust Audio-Visual Speech Recognition
Multimedia
Helps computers understand talking even with loud noise.