Score: 0

Multilingual Contextualization of Large Language Models for Document-Level Machine Translation

Published: April 16, 2025 | arXiv ID: 2504.12140v2

By: Miguel Moura Ramos , Patrick Fernandes , Sweta Agrawal and more

Potential Business Impact:

Translates whole books, not just sentences.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) have demonstrated strong performance in sentence-level machine translation, but scaling to document-level translation remains challenging, particularly in modeling long-range dependencies and discourse phenomena across sentences and paragraphs. In this work, we propose a method to improve LLM-based long-document translation through targeted fine-tuning on high-quality document-level data, which we curate and introduce as DocBlocks. Our approach supports multiple translation paradigms, including direct document-to-document and chunk-level translation, by integrating instructions both with and without surrounding context. This enables models to better capture cross-sentence dependencies while maintaining strong sentence-level translation performance. Experimental results show that incorporating multiple translation paradigms improves document-level translation quality and inference speed compared to prompting and agent-based methods.

Beyond the Sentence: A Survey on Context-Aware Machine Translation with Large Language Models

Computation and Language

Makes computer translations understand more context.

9 Jun 2025 0

92%

Two Intermediate Translations Are Better Than One: Fine-tuning LLMs for Document-level Translation Refinement

Computation and Language

Makes translated documents sound more natural.

8 Apr 2025 2

91%

Improving LLM-based Document-level Machine Translation with Multi-Knowledge Fusion

Computation and Language

Improves computer translation by using summaries and key words.

15 Mar 2025 2

View PDF Login to Bookmark

Page Count

24 pages

Multilingual Contextualization of Large Language Models for Document-Level Machine Translation

Translates whole books, not just sentences.

Technical Abstract

Beyond the Sentence: A Survey on Context-Aware Machine Translation with Large Language Models

Two Intermediate Translations Are Better Than One: Fine-tuning LLMs for Document-level Translation Refinement

Improving LLM-based Document-level Machine Translation with Multi-Knowledge Fusion