Score: 1

Finetuning Vision-Language Models as OCR Systems for Low-Resource Languages: A Case Study of Manchu

Published: July 9, 2025 | arXiv ID: 2507.06761v1

By: Yan Hon Michael Chung, Donghyeok Choi

Potential Business Impact:

Helps read old, rare Manchu writings.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Manchu, a critically endangered language essential for understanding early modern Eastern Eurasian history, lacks effective OCR systems that can handle real-world historical documents. This study develops high-performing OCR systems by fine-tuning three open-source vision-language models (LLaMA-3.2-11B, Qwen2.5-VL-7B, Qwen2.5-VL-3B) on 60,000 synthetic Manchu word images using parameter-efficient training. LLaMA-3.2-11B achieved exceptional performance with 98.3\% word accuracy and 0.0024 character error rate on synthetic data, while crucially maintaining 93.1\% accuracy on real-world handwritten documents. Comparative evaluation reveals substantial advantages over traditional approaches: while a CRNN baseline achieved 99.8\% synthetic accuracy, it suffered severe degradation to 72.5\% on real documents. Our approach demonstrates effective synthetic-to-real domain transfer, providing a cost-effective solution deployable on accessible infrastructure. This work establishes a transferable framework for endangered language OCR that removes technical and financial barriers in digital humanities, enabling historians and linguists to process historical archives without specialized computing resources. Code and model weights are available at https://github.com/mic7ch1/ManchuAI-OCR.

PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language

CV and Pattern Recognition

Helps computers read a difficult language.

15 May 2025 1

88%

Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning

Computation and Language

Helps computers understand old Chinese writing.

10 Sep 2025 0

87%

Exploring OCR-augmented Generation for Bilingual VQA

CV and Pattern Recognition

Lets computers read and understand pictures with text.

2 Oct 2025 2

View PDF Login to Bookmark

Country of Origin

🇭🇰 Hong Kong

Repos / Data Links

github.com

Page Count

12 pages

Finetuning Vision-Language Models as OCR Systems for Low-Resource Languages: A Case Study of Manchu

Helps read old, rare Manchu writings.

Technical Abstract

PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language

Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning

Exploring OCR-augmented Generation for Bilingual VQA