RAIR: Retrieval-Augmented Iterative Refinement for Chinese Spelling Correction
By: Junhong Liang, Yu Zhou
Potential Business Impact:
Fixes spelling errors in special texts.
Chinese Spelling Correction (CSC) aims to detect and correct erroneous tokens in sentences. Traditional CSC focuses on equal length correction and uses pretrained language models (PLMs). While Large Language Models (LLMs) have shown remarkable success in identifying and rectifying potential errors, they often struggle with adapting to domain-specific corrections, especially when encountering terminologies in specialized domains. To address domain adaptation, we propose a \textbf{R}etrieval-\textbf{A}ugmented \textbf{I}terative \textbf{R}efinement (RAIR) framework. Our approach constructs a retrieval corpus adaptively from domain-specific training data and dictionaries, employing a fine-tuned retriever to ensure that the retriever catches the error correction pattern. We also extend equal-length into variable-length correction scenarios. Extensive experiments demonstrate that our framework outperforms current approaches in domain spelling correction and significantly improves the performance of LLMs in variable-length scenarios.
Similar Papers
SCIR: A Self-Correcting Iterative Refinement Framework for Enhanced Information Extraction Based on Schema
Computation and Language
Makes computers understand information better, cheaper.
Listening, Imagining & Refining: A Heuristic Optimized ASR Correction Framework with LLMs
Audio and Speech Processing
Makes voice assistants understand words better.
Reasoning-Intensive Regression
Computation and Language
Helps computers find hidden numbers in text.