SW-ASR: A Context-Aware Hybrid ASR Pipeline for Robust Single Word Speech Recognition
By: Manali Sharma, Riya Naik, Buvaneshwari G
Potential Business Impact:
Helps computers understand spoken words in noisy places.
Single-word Automatic Speech Recognition (ASR) is a challenging task due to the lack of linguistic context and sensitivity to noise, pronunciation variation, and channel artifacts, especially in low-resource, communication-critical domains such as healthcare and emergency response. This paper reviews recent deep learning approaches and proposes a modular framework for robust single-word detection. The system combines denoising and normalization with a hybrid ASR front end (Whisper + Vosk) and a verification layer designed to handle out-of-vocabulary words and degraded audio. The verification layer supports multiple matching strategies, including embedding similarity, edit distance, and LLM-based matching with optional contextual guidance. We evaluate the framework on the Google Speech Commands dataset and a curated real-world dataset collected from telephony and messaging platforms under bandwidth-limited conditions. Results show that while the hybrid ASR front end performs well on clean audio, the verification layer significantly improves accuracy on noisy and compressed channels. Context-guided and LLM-based matching yield the largest gains, demonstrating that lightweight verification and context mechanisms can substantially improve single-word ASR robustness without sacrificing latency required for real-time telephony applications.
Similar Papers
AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition
Audio and Speech Processing
Helps people with speech problems talk to computers.
Speech-Aware Long Context Pruning and Integration for Contextualized Automatic Speech Recognition
Computation and Language
Listens better to long talks, even with noise.
Index-ASR Technical Report
Sound
Makes voice assistants understand better, less mistakes.