What question did this study set out to answer?

The study aims to optimize metro train timetable rescheduling using multi-agent deep reinforcement learning to enhance reliability in urban rail systems.

April 1, 2026Open Access

Large-Scale Metro Train Timetable Rescheduling via Multi-Agent Deep Reinforcement Learning: A High-Dimensional Optimization Approach in Flatland Environment

Key Points

The study aims to optimize metro train timetable rescheduling using multi-agent deep reinforcement learning to enhance reliability in urban rail systems.
Developed a decentralized cooperative process for train timetable rescheduling.
Employed the Multi-Agent Advantage Actor-Critic (MAA2C) algorithm for dynamic scheduling.
Implemented the approach in the Flatland simulation environment to represent complex rail networks.
Designed a composite reward function that balances delay minimization and passenger satisfaction.
Introduced random disturbances during training to improve model robustness.
The MAA2C-based framework significantly outperformed traditional methods in various network scenarios.
Achieved faster convergence in smaller-scale models, indicating increased efficiency.
Demonstrated superior scalability in managing complex, large-scale train systems.
Effectively reduced passenger waiting times while adhering to operational constraints.

Abstract

Metro train timetable rescheduling (TTR) is a critical task for ensuring the reliability of urban rail transit systems. However, with the increasing density of railway networks and the growing number of operational trains, TTR has evolved into a typical high-dimensional and large-scale optimization problem. Traditional mathematical programming and heuristic approaches often struggle with the “curse of dimensionality” and fail to provide real-time responses under stochastic disturbances. To address these challenges, this paper proposes a novel framework based on Multi-Agent Deep Reinforcement Learning (MADRL). Specifically, we model the TTR problem as a decentralized cooperative process and utilize the Multi-Agent Advantage Actor-Critic (MAA2C) algorithm to optimize train schedules dynamically. The proposed framework is implemented within the Flatland simulation environment, which allows for the representation of complex arbitrary topologies. We design a composite reward function that minimizes total delay deviation while maximizing passenger satisfaction, subject to constraints such as headway, operating time, and train capacity. Furthermore, to enhance the robustness of the model against high-dimensional state uncertainties, random disturbances following a negative exponential distribution are introduced during training. Experimental results across various scenarios—ranging from simple dual-track to complex random networks—demonstrate that the MAA2C-based approach significantly outperforms traditional baselines. It not only achieves faster convergence in small-scale scenarios but also demonstrates superior computational efficiency and scalability in large-scale environments, effectively minimizing passenger waiting times. This study validates the potential of MADRL in solving high-dimensional traffic control problems for intelligent transportation systems.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Yang et al. (Mon,) studied this question.

synapsesocial.com/papers/69ccb66716edfba7beb8808b https://doi.org/https://doi.org/10.3390/app16073338

Bookmark

View Full Paper