What question did this study set out to answer?

The aim is to create a reliable and explainable method for traffic signal control using deep reinforcement learning and large language models.

March 10, 2026Open Access

Towards Generalisable and Explainable Traffic Signal Control via Deep Reinforcement Learning and Large Language Models

Key Points

The aim is to create a reliable and explainable method for traffic signal control using deep reinforcement learning and large language models.
Developed GenEx-TSC integrating DRL with LLMs for traffic signal control.
Trained a DRL agent considering intersection heterogeneity and traffic states.
Optimised the LLM agent in two stages: distillation from a larger LLM and calibration with a DRL evaluation network.
Synthesised 10 intersection networks in SUMO for testing.
GenEx-TSC outperforms traditional signal control methods in accuracy and explainability.
Significant improvements in generalisation across varied traffic environments.
Demonstrated efficient and interpretable cycle-level signal timing strategies.

Abstract

ABSTRACT As a government‐regulated public service, traffic signal control (TSC) requires reliable and transparent decision‐making. However, existing deep reinforcement learning (DRL) methods, despite improvements in control accuracy, still lack explainability and generalisation, severely limiting their applicability in real‐world environments. To address the challenges above, this paper proposes GenEx‐TSC, a generalisable and explainable TSC method that integrates deep reinforcement learning with large language models (LLMs). First, starting from vehicle‐level states, we train a DRL agent incorporating intersection physical heterogeneity and neighbourhood information, which lays the evaluation foundation for constructing a high‐quality LLM dataset. Subsequently, the LLM agent is optimised through a two‐stage training mechanism. In the distillation stage, a lightweight LLM agent is trained using the reasoning trajectories of a larger‐scale LLM agent, inheriting its semantic understanding and decision‐generation capabilities and in the alignment stage, the DRL evaluation network is employed to calibrate the outputs of the distilled LLM agent, ensuring that the generated cycle‐level signal timing strategies are both efficient and interpretable. We synthesise 10 intersection networks with different physical attributes in SUMO and set traffic flows of varying scales. Experimental results across diverse traffic environments demonstrate that the proposed GenEx‐TSC exhibits clear advantages over traditional methods, mainstream DRL methods and LLM baselines in terms of control accuracy, generalisation and explainability.

Towards Generalisable and Explainable Traffic Signal Control via Deep Reinforcement Learning and Large Language Models

Key Points

Abstract

Cite This Study