What question did this study set out to answer?

The aim is to create an adaptive traffic signal system using deep reinforcement learning to enhance urban traffic flow.

April 21, 2026Open Access

TrafficOpt RL: Adaptive Traffic Signal Optimization Using Deep Reinforcement Learning

Key Points

The aim is to create an adaptive traffic signal system using deep reinforcement learning to enhance urban traffic flow.
Developed TrafficOpt RL system employing the Deep Q-Network algorithm.
Simulated a four-way intersection with stochastic vehicle arrivals using a Gymnasium-compatible environment.
Evaluated system performance against fixed-timing signals across multiple metrics.
TrafficOpt RL significantly reduced average vehicle waiting times compared to fixed-timing systems.
Measured improvements in total intersection throughput and composite efficiency scores.
Evaluation generated analytical visualizations to illustrate system performance.

Abstract

Urban traffic congestion is a critical infrastructure challenge facing modern cities as vehicle populations expand and urban density increases. Conventional fixed-timing traffic signal systems are incapable of adapting to the stochastic and dynamic nature of real-world traffic flows, resulting in wasted green-light time, queue buildup, increased vehicle emissions, and emergency response delays. This paper presents TrafficOpt RL, an end-to-end adaptive traffic signal optimization system that applies the Deep Q-Network (DQN) algorithm to learn intelligent signaling policies at urban intersections through iterative simulation experience. The system is built on a custom Gymnasiumcompatible simulation environment modeling a four-way intersection with stochastic Poisson vehicle arrivals. The DQN agent, implemented via the Stable-Baselines3 framework, utilizes experience replay, target network stabilization, and epsilon-greedy exploration to converge on policies minimizing aggregate vehicle waiting times and maximizing intersection throughput. All training metrics and simulation data are persistently stored in a MySQL relational database through automated callback logging, enabling systematic performance analysis. Evaluation via direct comparison against a fixed-timing baseline demonstrates measurable superiority of the reinforcement learning approach across three performance dimensions: average vehicle waiting time, total throughput, and composite efficiency score. Three analytical visualizations are generated to communicate system performance. TrafficOpt RL constitutes a practical proof-of-concept for deep reinforcement learning integration into intelligent transportation systems and smart city infrastructure.

Bookmark

View Full Paper

Bookmark

View Full Paper

TrafficOpt RL: Adaptive Traffic Signal Optimization Using Deep Reinforcement Learning

Key Points

Abstract

Cite This Study