Score: 0

Generative Actor Critic

Published: December 25, 2025 | arXiv ID: 2512.21527v1

By: Aoyang Qin , Deqian Kong , Wei Wang and more

Conventional Reinforcement Learning (RL) algorithms, typically focused on estimating or maximizing expected returns, face challenges when refining offline pretrained models with online experiences. This paper introduces Generative Actor Critic (GAC), a novel framework that decouples sequential decision-making by reframing \textit{policy evaluation} as learning a generative model of the joint distribution over trajectories and returns, $p(τ, y)$, and \textit{policy improvement} as performing versatile inference on this learned model. To operationalize GAC, we introduce a specific instantiation based on a latent variable model that features continuous latent plan vectors. We develop novel inference strategies for both \textit{exploitation}, by optimizing latent plans to maximize expected returns, and \textit{exploration}, by sampling latent plans conditioned on dynamically adjusted target returns. Experiments on Gym-MuJoCo and Maze2D benchmarks demonstrate GAC's strong offline performance and significantly enhanced offline-to-online improvement compared to state-of-the-art methods, even in absence of step-wise rewards.

Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space

Machine Learning (CS)

Teaches computers to learn better by explaining mistakes.

4 Dec 2025 1

89%

GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies

Machine Learning (CS)

Lets robots learn complex moves safely and quickly.

2 Dec 2025 1

89%

RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks

Machine Learning (CS)

Makes AI write better stories and code.

3 Nov 2025 0

View PDF Login to Bookmark

Generative Actor Critic

Technical Abstract

Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space

GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies

RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks