From Silent Signals to Natural Language: A Dual-Stage Transformer-LLM Approach
By: Nithyashree Sivasubramaniam
Potential Business Impact:
Lets computers understand silent talking better.
Silent Speech Interfaces (SSIs) have gained attention for their ability to generate intelligible speech from non-acoustic signals. While significant progress has been made in advancing speech generation pipelines, limited work has addressed the recognition and downstream processing of synthesized speech, which often suffers from phonetic ambiguity and noise. To overcome these challenges, we propose an enhanced automatic speech recognition framework that combines a transformer-based acoustic model with a large language model (LLM) for post-processing. The transformer captures full utterance context, while the LLM ensures linguistic consistency. Experimental results show a 16% relative and 6% absolute reduction in word error rate (WER) over a 36% baseline, demonstrating substantial improvements in intelligibility for silent speech interfaces.
Similar Papers
Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Computation and Language
New AI better at translating spoken words.
SpeechLLM: Unified Speech and Language Model for Enhanced Multi-Task Understanding in Low Resource Settings
Computation and Language
Lets computers understand spoken words for tasks.
MultiStream-LLM: Bridging Modalities for Robust Sign Language Translation
Computation and Language
Translates sign language better by using special parts.