The rapid increase in high-rise building construction has intensified the need for efficient elevator system operations. This paper addresses the elevator dispatching problem in elevator group control systems. We formulate the problem as a Semi-Markov Decision Process (SMDP), defining the state representation, action space, and reward function. A two-phase model is then introduced, integrating imitation learning and deep reinforcement learning techniques to derive the optimal elevator dispatching policy from the formulated SMDP. In the first phase, a policy network is pre-trained by estimating the time required for elevator cars to pick up assigned hall requests. In the second phase, the pre-trained policy network is further optimized using Proximal Policy Optimization (PPO), a well-known policy-based deep reinforcement learning method. Additionally, we propose a novel update interval, termed the “direct-effect” interval, which improves policy training during the reinforcement learning phase. Notably, this direct-effect interval concept has potential applicability to other multi-resource scheduling problems. Empirical experiments demonstrate the advantages of incorporating imitation learning before reinforcement learning, as well as the effectiveness of employing the direct-effect update interval during the reinforcement learning phase. Furthermore, the proposed model outperforms various benchmark rules in terms of average waiting time and the distribution of long waiting times, as validated across four traffic patterns.
Wan et al. (Fri,) studied this question.