Listening, Imagining & Refining: A Heuristic Optimized ASR Correction Framework with LLMs
By: Yutong Liu , Ziyue Zhang , Cheng Huang and more
Potential Business Impact:
Makes voice assistants understand words better.
Automatic Speech Recognition (ASR) systems remain prone to errors that affect downstream applications. In this paper, we propose LIR-ASR, a heuristic optimized iterative correction framework using LLMs, inspired by human auditory perception. LIR-ASR applies a "Listening-Imagining-Refining" strategy, generating phonetic variants and refining them in context. A heuristic optimization with finite state machine (FSM) is introduced to prevent the correction process from being trapped in local optima and rule-based constraints help maintain semantic fidelity. Experiments on both English and Chinese ASR outputs show that LIR-ASR achieves average reductions in CER/WER of up to 1.5 percentage points compared to baselines, demonstrating substantial accuracy gains in transcription.
Similar Papers
Listening, Imagining \& Refining: A Heuristic Optimized ASR Correction Framework with LLMs
Audio and Speech Processing
Makes voice assistants understand words better.
FunAudio-ASR Technical Report
Computation and Language
Makes talking computers understand messy, noisy speech.
FunAudio-ASR Technical Report
Computation and Language
Makes talking computers understand messy, noisy speech.