Score: 1

Using a single actor to output personalized policy for different intersections

Published: March 10, 2025 | arXiv ID: 2503.07678v1

By: Kailing Zhou , Chengwei Zhang , Furui Zhan and more

Potential Business Impact:

Makes traffic lights smarter for smoother traffic flow.

Business Areas:

Peer to Peer Collaboration

Recently, with the development of Multi-agent reinforcement learning (MARL), adaptive traffic signal control (ATSC) has achieved satisfactory results. In traffic scenarios with multiple intersections, MARL treats each intersection as an agent and optimizes traffic signal control strategies through learning and real-time decision-making. Considering that observation distributions of intersections might be different in real-world scenarios, shared parameter methods might lack diversity and thus lead to high generalization requirements in the shared-policy network. A typical solution is to increase the size of network parameters. However, simply increasing the scale of the network does not necessarily improve policy generalization, which is validated in our experiments. Accordingly, an approach that considers both the personalization of intersections and the efficiency of parameter sharing is required. To this end, we propose Hyper-Action Multi-Head Proximal Policy Optimization (HAMH-PPO), a Centralized Training with Decentralized Execution (CTDE) MARL method that utilizes a shared PPO policy network to deliver personalized policies for intersections with non-iid observation distributions. The centralized critic in HAMH-PPO uses graph attention units to calculate the graph representations of all intersections and outputs a set of value estimates with multiple output heads for each intersection. The decentralized execution actor takes the local observation history as input and output distributions of action as well as a so-called hyper-action to balance the multiple values estimated from the centralized critic to further guide the updating of TSC policies. The combination of hyper-action and multi-head values enables multiple agents to share a single actor-critic while achieving personalized policies.

Adaptive Traffic Signal Control based on Multi-Agent Reinforcement Learning. Case Study on a simulated real-world corridor

Multiagent Systems

Makes traffic lights smarter, reducing car waits.

4 Mar 2025 0

89%

Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control

Multiagent Systems

Makes traffic lights smarter for smoother driving.

4 Dec 2025 0

87%

A Hierarchical Signal Coordination and Control System Using a Hybrid Model-based and Reinforcement Learning Approach

Systems and Control

Makes traffic lights smarter to reduce jams.

12 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

30 pages

Using a single actor to output personalized policy for different intersections

Makes traffic lights smarter for smoother traffic flow.

Technical Abstract

Adaptive Traffic Signal Control based on Multi-Agent Reinforcement Learning. Case Study on a simulated real-world corridor

Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control

A Hierarchical Signal Coordination and Control System Using a Hybrid Model-based and Reinforcement Learning Approach