Score: 0

DARO: Difficulty-Aware Reweighting Policy Optimization

Published: October 10, 2025 | arXiv ID: 2510.09001v1

By: Jingyu Zhou , Lu Ma , Hao Liang and more

Potential Business Impact:

Teaches AI to solve math problems better.

Business Areas:

A/B Testing Data and Analytics

Recent advances in large language models (LLMs) have shown that reasoning ability can be significantly enhanced through Reinforcement Learning with Verifiable Rewards (RLVR). Group Relative Policy Optimization (GRPO) has emerged as the de facto approach for RLVR, inspiring numerous variants. However, our mathematical analysis reveals that these methods are fundamentally weighted variations of GRPO. We provide a unified view, demonstrating that their reliance on static or overly simplistic weighting schemes tied to sample difficulty prevents adaptation to a model's evolving capabilities. This creates a significant loss scale issue, where training disproportionately focuses on certain difficulty levels at the expense of others, hindering overall performance. To address these limitations, we introduce \textbf{Difficulty-Aware Reweighting Policy Optimization (DARO)}, a method that dynamically adjusts the loss contribution of each difficulty group based on the model's learning state. Extensive experiments on Qwen2.5-Math-1.5B, Qwen2.5-Math-7B, and Llama3.1-8B show that DARO outperforms four leading baselines across six math benchmarks, achieving significantly faster convergence and superior final performance.

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

CV and Pattern Recognition

Helps AI understand videos better by learning smarter.

9 Jun 2025 1

90%

G$^2$RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance

Artificial Intelligence

Helps small AI learn to think better.

18 Aug 2025 1

89%

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Artificial Intelligence

Teaches AI to think through problems, not just copy.

17 Mar 2025 0

View PDF Login to Bookmark

Page Count

15 pages

DARO: Difficulty-Aware Reweighting Policy Optimization

Teaches AI to solve math problems better.

Technical Abstract

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

G$^2$RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization