Retrieval Augmented Generation based context discovery for ASR
By: Dimitrios Siskos , Stavros Papadopoulos , Pablo Peso Parada and more
Potential Business Impact:
Makes voice recorders understand tricky words better.
This work investigates retrieval augmented generation as an efficient strategy for automatic context discovery in context-aware Automatic Speech Recognition (ASR) system, in order to improve transcription accuracy in the presence of rare or out-of-vocabulary terms. However, identifying the right context automatically remains an open challenge. This work proposes an efficient embedding-based retrieval approach for automatic context discovery in ASR. To contextualize its effectiveness, two alternatives based on large language models (LLMs) are also evaluated: (1) large language model (LLM)-based context generation via prompting, and (2) post-recognition transcript correction using LLMs. Experiments on the TED-LIUMv3, Earnings21 and SPGISpeech demonstrate that the proposed approach reduces WER by up to 17% (percentage difference) relative to using no-context, while the oracle context results in a reduction of up to 24.1%.
Similar Papers
Learning Contextual Retrieval for Robust Conversational Search
Information Retrieval
Helps search engines remember what you asked before.
ELITE: Embedding-Less retrieval with Iterative Text Exploration
Computation and Language
Helps computers remember more for better answers.
Improving Named Entity Transcription with Contextual LLM-based Revision
Computation and Language
Fixes computer speech errors for important names.