Score: 0

Dynamic Jointly Batch Selection for Data Efficient Machine Translation Fine-Tuning

Published: November 6, 2025 | arXiv ID: 2511.04406v1

By: Mohammad Amin Ghanizadeh, Mohammad Javad Dousti

Potential Business Impact:

Makes computer translations much better and faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Data quality and its effective selection are fundamental to improving the performance of machine translation models, serving as cornerstones for achieving robust and reliable translation systems. This paper presents a data selection methodology specifically designed for fine-tuning machine translation systems, which leverages the synergy between a learner model and a pre-trained reference model to enhance overall training effectiveness. By defining a learnability score, our approach systematically evaluates the utility of data points for training, ensuring that only the most relevant and impactful examples contribute to the fine-tuning process. Furthermore, our method employs a batch selection strategy which considers interdependencies among data points, optimizing the efficiency of the training process while maintaining a focus on data relevance. Experiments on English to Persian and several other language pairs using an mBART model fine-tuned on the CCMatrix dataset demonstrate that our method can achieve up to a fivefold improvement in data efficiency compared to an iid baseline. Experimental results indicate that our approach improves computational efficiency by 24 when utilizing cached embeddings, as it requires fewer training data points. Additionally, it enhances generalization, resulting in superior translation performance compared to random selection method.

Improving Translation Quality by Selecting Better Data for LLM Fine-Tuning: A Comparative Analysis

Computation and Language

Makes computer translators much smarter with better word choices.

12 Dec 2025 0

88%

Exploring Parameter-Efficient Fine-Tuning and Backtranslation for the WMT 25 General Translation Task

Computation and Language

Improves Japanese to English translation quality.

15 Nov 2025 0

88%

Enhancing BERT Fine-Tuning for Sentiment Analysis in Lower-Resourced Languages

Computation and Language

Teaches computers new languages with less data.

1 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇮🇷 Iran

Page Count

9 pages

Dynamic Jointly Batch Selection for Data Efficient Machine Translation Fine-Tuning

Makes computer translations much better and faster.

Technical Abstract

Improving Translation Quality by Selecting Better Data for LLM Fine-Tuning: A Comparative Analysis

Exploring Parameter-Efficient Fine-Tuning and Backtranslation for the WMT 25 General Translation Task

Enhancing BERT Fine-Tuning for Sentiment Analysis in Lower-Resourced Languages