Urban rail transit stations, characterized by high passenger density and confined spaces, face significant challenges in emergency evacuations during peak hours. To address the complexity of station layouts and crowd dynamics, this study introduces a reinforcement learning–enhanced framework for simulating passenger flow dynamics in urban rail transit (URT) stations, integrating Wi-Fi probe data with Markov decision process (MDP)-driven path planning. By formulating pedestrian navigation as an MDP and optimizing decision policies through a dueling double deep Q-Network (D3QN), the proposed method addresses the limitations of traditional A* algorithms in handling complex crowd behaviors and dynamic congestion patterns. Wi-Fi probes deployed across station infrastructure capture real-time passenger trajectories, enabling data-driven calibration of the MDP state space and reward function. The D3QN architecture leverages prioritized experience replay and adaptive exploration to balance path optimality with obstacle avoidance, achieving a 92.4% path similarity index (PSI) to ground-truth trajectories. Experimental validation at Shenzhen Children’s Palace Station in China demonstrated a 20.3% improvement in evacuation efficiency and a 5.9% evacuation time error (ETE), outperforming traditional A* and social force models. The framework identifies critical congestion zones at fare gate intersections and escalator merges, reducing peak densities by 23.7% through anticipative rerouting. This work establishes a scalable paradigm for real-time crowd management in transportation hubs, bridging data-driven sensing with AI-optimized simulation.
Mo et al. (Thu,) studied this question.