Score: 0

Growing with the Generator: Self-paced GRPO for Video Generation

Published: November 24, 2025 | arXiv ID: 2511.19356v1

By: Rui Li , Yuanzhi Liang , Ziqi Ni and more

Potential Business Impact:

Makes AI videos better by learning as it goes.

Business Areas:

Image Recognition Data and Analytics, Software

Group Relative Policy Optimization (GRPO) has emerged as a powerful reinforcement learning paradigm for post-training video generation models. However, existing GRPO pipelines rely on static, fixed-capacity reward models whose evaluation behavior is frozen during training. Such rigid rewards introduce distributional bias, saturate quickly as the generator improves, and ultimately limit the stability and effectiveness of reinforcement-based alignment. We propose Self-Paced GRPO, a competence-aware GRPO framework in which reward feedback co-evolves with the generator. Our method introduces a progressive reward mechanism that automatically shifts its emphasis from coarse visual fidelity to temporal coherence and fine-grained text-video semantic alignment as generation quality increases. This self-paced curriculum alleviates reward-policy mismatch, mitigates reward exploitation, and yields more stable optimization. Experiments on VBench across multiple video generation backbones demonstrate consistent improvements in both visual quality and semantic alignment over GRPO baselines with static rewards, validating the effectiveness and generality of Self-Paced GRPO.

TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment

CV and Pattern Recognition

Makes videos from pictures better.

9 Jan 2026 0

92%

Seeing What Matters: Visual Preference Policy Optimization for Visual Generation

CV and Pattern Recognition

Makes AI pictures better by fixing small mistakes.

24 Nov 2025 0

92%

DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO

CV and Pattern Recognition

Makes AI art more creative and varied.

25 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

16 pages

Growing with the Generator: Self-paced GRPO for Video Generation

Makes AI videos better by learning as it goes.

Technical Abstract

TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment

Seeing What Matters: Visual Preference Policy Optimization for Visual Generation

DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO