ABSTRACT As a government‐regulated public service, traffic signal control (TSC) requires reliable and transparent decision‐making. However, existing deep reinforcement learning (DRL) methods, despite improvements in control accuracy, still lack explainability and generalisation, severely limiting their applicability in real‐world environments. To address the challenges above, this paper proposes GenEx‐TSC, a generalisable and explainable TSC method that integrates deep reinforcement learning with large language models (LLMs). First, starting from vehicle‐level states, we train a DRL agent incorporating intersection physical heterogeneity and neighbourhood information, which lays the evaluation foundation for constructing a high‐quality LLM dataset. Subsequently, the LLM agent is optimised through a two‐stage training mechanism. In the distillation stage, a lightweight LLM agent is trained using the reasoning trajectories of a larger‐scale LLM agent, inheriting its semantic understanding and decision‐generation capabilities and in the alignment stage, the DRL evaluation network is employed to calibrate the outputs of the distilled LLM agent, ensuring that the generated cycle‐level signal timing strategies are both efficient and interpretable. We synthesise 10 intersection networks with different physical attributes in SUMO and set traffic flows of varying scales. Experimental results across diverse traffic environments demonstrate that the proposed GenEx‐TSC exhibits clear advantages over traditional methods, mainstream DRL methods and LLM baselines in terms of control accuracy, generalisation and explainability.
Huang et al. (Thu,) studied this question.