What question did this study set out to answer?

This research aims to improve urban traffic signal control by addressing policy homogenization and inefficient credit assignment in multi-agent systems.

May 24, 2026Open Access

Multi-Agent Deep Reinforcement Learning with Contrastive Policy Diversification and Hierarchical Graph Networks for Urban Traffic Signal Control

Key Points

This research aims to improve urban traffic signal control by addressing policy homogenization and inefficient credit assignment in multi-agent systems.
Developed the Multi-Agent Hierarchical Contrastive Learning Traffic Signal Control model (MAHCL-TSC) to enhance state representation.
Implemented a hierarchical graph convolutional credit allocation network for more accurate value estimation.
Conducted experiments using the Simulation of Urban Mobility (SUMO) environment on synthetic grid networks (4x4 and 6x6).
The MAHCL-TSC model significantly improved traffic signal control performance compared to existing methods.
Results demonstrated enhanced precision in credit assignment through structure-aware collaborations.
The model shows potential scalability as network sizes increase.

Abstract

Multi-Agent Reinforcement Learning (MARL) provides an effective approach for urban multi-intersection traffic signal control. However, existing methods have faced two fundamental challenges, policy homogenization and inefficient credit assignment. The former led to convergent agent policies that failed to adapt to heterogeneous traffic patterns, while the latter prevented agents from accurately evaluating their individual contributions to system performance. To address these issues, this paper proposes a Multi-Agent Hierarchical Contrastive Learning Traffic Signal Control (MAHCL-TSC) model. The model incorporates an unsupervised contrastive learning module that enhances the discriminative power of state representations, thereby alleviating policy homogenization. Additionally, it designs a hierarchical graph convolutional credit allocation network that leverages road network topology and functional characteristics to enable structure-aware collaborative value estimation, significantly improving the precision of credit assignment. Based on these components, a Contrastive QTRAN with Hierarchical Graph Convolution (CQTRAN-HGC) algorithm is proposed, which jointly optimizes contrastive learning loss and QTRAN constraint loss. Experiments conducted in the Simulation of Urban Mobility (SUMO) simulation environment on 4 × 4 and 6 × 6 synthetic grid networks demonstrate that the proposed model improves traffic signal control performance under the tested structured simulation settings and shows potential scalability as the network size increases.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Yan et al. (Fri,) studied this question.

synapsesocial.com/papers/6a12960648a0ea1665672811 https://doi.org/https://doi.org/10.3390/ijgi15060229

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper