Score: 1

Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations

Published: December 21, 2025 | arXiv ID: 2512.18906v1

By: Shaomu Tan , Ryosuke Mitani , Ritvik Choudhary and more

Potential Business Impact:

Makes computer translations better and easier to understand.

Business Areas:

Text Analytics Data and Analytics, Software

Over the years, automatic MT metrics have hillclimbed benchmarks and presented strong and sometimes human-level agreement with human ratings. Yet they remain black-box, offering little insight into their decision-making and often failing under real-world out-of-distribution (OOD) inputs. We introduce Remedy-R, a reasoning-driven generative MT metric trained with reinforcement learning from pairwise translation preferences, without requiring error-span annotations or distillation from closed LLMs. Remedy-R produces step-by-step analyses of accuracy, fluency, and completeness, followed by a final score, enabling more interpretable assessments. With only 60K training pairs across two language pairs, Remedy-R remains competitive with top scalar metrics and GPT-4-based judges on WMT22-24 meta-evaluation, generalizes to other languages, and exhibits strong robustness on OOD stress tests. Moreover, Remedy-R models generate self-reflective feedback that can be reused for translation improvement. Building on this finding, we introduce Remedy-R Agent, a simple evaluate-revise pipeline that leverages Remedy-R's evaluation analysis to refine translations. This agent consistently improves translation quality across diverse models, including Qwen2.5, ALMA-R, GPT-4o-mini, and Gemini-2.0-Flash, suggesting that Remedy-R's reasoning captures translation-relevant information and is practically useful.

Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling

Computation and Language

Checks if translations are good, even bad ones.

18 Apr 2025 2

87%

AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning

Computation and Language

Teaches AI to think step-by-step, not just guess.

16 Oct 2025 2

87%

Reasoning-Intensive Regression

Computation and Language

Helps computers find hidden numbers in text.

29 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇳🇱 Netherlands

Repos / Data Links

github.com github.com github.com

Page Count

23 pages

Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations

Makes computer translations better and easier to understand.

Technical Abstract

Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling

AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning

Reasoning-Intensive Regression