What question did this study set out to answer?

The research aims to improve elevator dispatching by integrating imitation learning with deep reinforcement learning techniques.

January 25, 2026

A Hybrid Approach of Imitation Learning and Deep Reinforcement Learning with Direct-Effect Update Interval for Elevator Dispatching

Puntos clave

The research aims to improve elevator dispatching by integrating imitation learning with deep reinforcement learning techniques.
Formulated the elevator dispatching problem as a Semi-Markov Decision Process (SMDP)
Developed a two-phase model combining imitation learning with deep reinforcement learning
Pre-trained a policy network in the first phase to estimate elevator pick-up times
Optimized the policy with Proximal Policy Optimization (PPO) in the second phase
Introduced a novel 'direct-effect' update interval to enhance policy training
Demonstrated improved elevator dispatching policy with reduced average waiting times
Showed effective distribution of long waiting times across different traffic patterns
Outperformed traditional benchmark rules in efficiency and timing

Resumen

The rapid increase in high-rise building construction has intensified the need for efficient elevator system operations. This paper addresses the elevator dispatching problem in elevator group control systems. We formulate the problem as a Semi-Markov Decision Process (SMDP), defining the state representation, action space, and reward function. A two-phase model is then introduced, integrating imitation learning and deep reinforcement learning techniques to derive the optimal elevator dispatching policy from the formulated SMDP. In the first phase, a policy network is pre-trained by estimating the time required for elevator cars to pick up assigned hall requests. In the second phase, the pre-trained policy network is further optimized using Proximal Policy Optimization (PPO), a well-known policy-based deep reinforcement learning method. Additionally, we propose a novel update interval, termed the “direct-effect” interval, which improves policy training during the reinforcement learning phase. Notably, this direct-effect interval concept has potential applicability to other multi-resource scheduling problems. Empirical experiments demonstrate the advantages of incorporating imitation learning before reinforcement learning, as well as the effectiveness of employing the direct-effect update interval during the reinforcement learning phase. Furthermore, the proposed model outperforms various benchmark rules in terms of average waiting time and the distribution of long waiting times, as validated across four traffic patterns.

Me gusta

Guardar

Me gusta

Guardar

A Hybrid Approach of Imitation Learning and Deep Reinforcement Learning with Direct-Effect Update Interval for Elevator Dispatching

Puntos clave

Resumen

Cite This Study