Score: 0

Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios

Published: May 18, 2025 | arXiv ID: 2505.13532v1

By: Feihong Zhang , Guojian Zhan , Bin Shuai and more

Potential Business Impact:

Teaches self-driving cars to be safe and fast.

Business Areas:
Autonomous Vehicles Transportation

Reinforcement learning (RL), known for its self-evolution capability, offers a promising approach to training high-level autonomous driving systems. However, handling constraints remains a significant challenge for existing RL algorithms, particularly in real-world applications. In this paper, we propose a new safety-oriented training technique called harmonic policy iteration (HPI). At each RL iteration, it first calculates two policy gradients associated with efficient driving and safety constraints, respectively. Then, a harmonic gradient is derived for policy updating, minimizing conflicts between the two gradients and consequently enabling a more balanced and stable training process. Furthermore, we adopt the state-of-the-art DSAC algorithm as the backbone and integrate it with our HPI to develop a new safe RL algorithm, DSAC-H. Extensive simulations in multi-lane scenarios demonstrate that DSAC-H achieves efficient driving performance with near-zero safety constraint violations.

Country of Origin
🇨🇳 China

Page Count
7 pages

Category
Computer Science:
Robotics