Score: 1

RAIR: Retrieval-Augmented Iterative Refinement for Chinese Spelling Correction

Published: April 26, 2025 | arXiv ID: 2504.18938v2

By: Junhong Liang, Yu Zhou

Potential Business Impact:

Fixes spelling errors in special texts.

Business Areas:
Semantic Search Internet Services

Chinese Spelling Correction (CSC) aims to detect and correct erroneous tokens in sentences. Traditional CSC focuses on equal length correction and uses pretrained language models (PLMs). While Large Language Models (LLMs) have shown remarkable success in identifying and rectifying potential errors, they often struggle with adapting to domain-specific corrections, especially when encountering terminologies in specialized domains. To address domain adaptation, we propose a \textbf{R}etrieval-\textbf{A}ugmented \textbf{I}terative \textbf{R}efinement (RAIR) framework. Our approach constructs a retrieval corpus adaptively from domain-specific training data and dictionaries, employing a fine-tuned retriever to ensure that the retriever catches the error correction pattern. We also extend equal-length into variable-length correction scenarios. Extensive experiments demonstrate that our framework outperforms current approaches in domain spelling correction and significantly improves the performance of LLMs in variable-length scenarios.

Repos / Data Links

Page Count
13 pages

Category
Computer Science:
Computation and Language