Score: 0

Context Volume Drives Performance: Tackling Domain Shift in Extremely Low-Resource Translation via RAG

Published: January 15, 2026 | arXiv ID: 2601.09982v1

By: David Samuel Setiawan, Raphaël Merx, Jey Han Lau

Neural Machine Translation (NMT) models for low-resource languages suffer significant performance degradation under domain shift. We quantify this challenge using Dhao, an indigenous language of Eastern Indonesia with no digital footprint beyond the New Testament (NT). When applied to the unseen Old Testament (OT), a standard NMT model fine-tuned on the NT drops from an in-domain score of 36.17 chrF++ to 27.11 chrF++. To recover this loss, we introduce a hybrid framework where a fine-tuned NMT model generates an initial draft, which is then refined by a Large Language Model (LLM) using Retrieval-Augmented Generation (RAG). The final system achieves 35.21 chrF++ (+8.10 recovery), effectively matching the original in-domain quality. Our analysis reveals that this performance is driven primarily by the number of retrieved examples rather than the choice of retrieval algorithm. Qualitative analysis confirms the LLM acts as a robust "safety net," repairing severe failures in zero-shot domains.

Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation

Computation and Language

Makes AI smarter at answering questions from many topics.

3 Apr 2025 1

89%

Integrating Domain Knowledge for Financial QA: A Multi-Retriever RAG Approach with LLMs

Computation and Language

Helps computers answer money questions better.

29 Dec 2025 2

89%

Enhancing Low-Resource Minority Language Translation with LLMs and Retrieval-Augmented Generation for Cultural Nuances

Computation and Language

Helps computers translate rare languages better.

16 May 2025 0

View PDF Login to Bookmark

Context Volume Drives Performance: Tackling Domain Shift in Extremely Low-Resource Translation via RAG

Technical Abstract

Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation

Integrating Domain Knowledge for Financial QA: A Multi-Retriever RAG Approach with LLMs

Enhancing Low-Resource Minority Language Translation with LLMs and Retrieval-Augmented Generation for Cultural Nuances