Score: 1

Bidirectional Soft Actor-Critic: Leveraging Forward and Reverse KL Divergence for Efficient Reinforcement Learning

Published: June 2, 2025 | arXiv ID: 2506.01639v1

By: Yixian Zhang , Huaze Tang , Changxu Wei and more

Potential Business Impact:

Teaches robots to learn faster and better.

Business Areas:

A/B Testing Data and Analytics

The Soft Actor-Critic (SAC) algorithm, a state-of-the-art method in maximum entropy reinforcement learning, traditionally relies on minimizing reverse Kullback-Leibler (KL) divergence for policy updates. However, this approach leads to an intractable optimal projection policy, necessitating gradient-based approximations that can suffer from instability and poor sample efficiency. This paper investigates the alternative use of forward KL divergence within SAC. We demonstrate that for Gaussian policies, forward KL divergence yields an explicit optimal projection policy -- corresponding to the mean and variance of the target Boltzmann distribution's action marginals. Building on the distinct advantages of both KL directions, we propose Bidirectional SAC, an algorithm that first initializes the policy using the explicit forward KL projection and then refines it by optimizing the reverse KL divergence. Comprehensive experiments on continuous control benchmarks show that Bidirectional SAC significantly outperforms standard SAC and other baselines, achieving up to a $30\%$ increase in episodic rewards, alongside enhanced sample efficiency.

DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

Machine Learning (CS)

Makes robots learn better even when things change.

14 Jun 2025 2

87%

Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic

Machine Learning (CS)

Teaches robots to learn from mistakes better.

5 Jun 2025 0

87%

A Diffusion Model Framework for Maximum Entropy Reinforcement Learning

Machine Learning (CS)

Makes robots learn tasks faster and better.

1 Dec 2025 1

View PDF Login to Bookmark

Page Count

23 pages

Bidirectional Soft Actor-Critic: Leveraging Forward and Reverse KL Divergence for Efficient Reinforcement Learning

Teaches robots to learn faster and better.

Technical Abstract

DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic

A Diffusion Model Framework for Maximum Entropy Reinforcement Learning