Customizing Speech Recognition Model with Large Language Model Feedback
By: Shaoshi Ling, Guoli Ye
Potential Business Impact:
Helps computers understand rare words in speech.
Automatic speech recognition (ASR) systems have achieved strong performance on general transcription tasks. However, they continue to struggle with recognizing rare named entities and adapting to domain mismatches. In contrast, large language models (LLMs), trained on massive internet-scale datasets, are often more effective across a wide range of domains. In this work, we propose a reinforcement learning based approach for unsupervised domain adaptation, leveraging unlabeled data to enhance transcription quality, particularly the named entities affected by domain mismatch, through feedback from a LLM. Given contextual information, our framework employs a LLM as the reward model to score the hypotheses from the ASR model. These scores serve as reward signals to fine-tune the ASR model via reinforcement learning. Our method achieves a 21\% improvement on entity word error rate over conventional self-training methods.
Similar Papers
Improving Named Entity Transcription with Contextual LLM-based Revision
Computation and Language
Fixes computer speech errors for important names.
Explore the Reinforcement Learning for the LLM based ASR and TTS system
Sound
Makes talking computers understand and speak better.
Large Language Models based ASR Error Correction for Child Conversations
Computation and Language
Makes computers understand kids' talking better.