KoGEC : Korean Grammatical Error Correction with Pre-trained Translation Models
By: Taeeun Kim, Semin Jeong, Youngsook Song
Potential Business Impact:
Fixes Korean writing mistakes better than big AI.
This research introduces KoGEC, a Korean Grammatical Error Correction system using pre\--trained translation models. We fine-tuned NLLB (No Language Left Behind) models for Korean GEC, comparing their performance against large language models like GPT-4 and HCX-3. The study used two social media conversation datasets for training and testing. The NLLB models were fine-tuned using special language tokens to distinguish between original and corrected Korean sentences. Evaluation was done using BLEU scores and an "LLM as judge" method to classify error types. Results showed that the fine-tuned NLLB (KoGEC) models outperformed GPT-4o and HCX-3 in Korean GEC tasks. KoGEC demonstrated a more balanced error correction profile across various error types, whereas the larger LLMs tended to focus less on punctuation errors. We also developed a Chrome extension to make the KoGEC system accessible to users. Finally, we explored token vocabulary expansion to further improve the model but found it to decrease model performance. This research contributes to the field of NLP by providing an efficient, specialized Korean GEC system and a new evaluation method. It also highlights the potential of compact, task-specific models to compete with larger, general-purpose language models in specialized NLP tasks.
Similar Papers
Adapting LLMs for Minimal-edit Grammatical Error Correction
Computation and Language
Fixes English grammar with fewer changes.
"When Data is Scarce, Prompt Smarter"... Approaches to Grammatical Error Correction in Low-Resource Settings
Computation and Language
Fixes grammar mistakes in many languages.
Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction
Computation and Language
Fixes writing mistakes by understanding why they are wrong.