Rewarding Creativity: A Human-Aligned Generative Reward Model for Reinforcement Learning in Storytelling
By: Zhaoyan Li , Hang Lei , Yujia Wang and more
While Large Language Models (LLMs) can generate fluent text, producing high-quality creative stories remains challenging. Reinforcement Learning (RL) offers a promising solution but faces two critical obstacles: designing reliable reward signals for subjective storytelling quality and mitigating training instability. This paper introduces the Reinforcement Learning for Creative Storytelling (RLCS) framework to systematically address both challenges. First, we develop a Generative Reward Model (GenRM) that provides multi-dimensional analysis and explicit reasoning about story preferences, trained through supervised fine-tuning on demonstrations with reasoning chains distilled from strong teacher models, followed by GRPO-based refinement on expanded preference data. Second, we introduce an entropy-based reward shaping strategy that dynamically prioritizes learning on confident errors and uncertain correct predictions, preventing overfitting on already-mastered patterns. Experiments demonstrate that GenRM achieves 68\% alignment with human creativity judgments, and RLCS significantly outperforms strong baselines including Gemini-2.5-Pro in overall story quality. This work provides a practical pipeline for applying RL to creative domains, effectively navigating the dual challenges of reward modeling and training stability.
Similar Papers
RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing
Artificial Intelligence
Helps AI write stories that are good and follow rules.
RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing
Artificial Intelligence
Makes stories follow rules and be good.
PCGRLLM: Large Language Model-Driven Reward Design for Procedural Content Generation Reinforcement Learning
Artificial Intelligence
Teaches game robots new tricks faster.