Key points are not available for this paper at this time.
Deep Reinforcement Learning (DRL) has recently been applied to eco-driving to intelligently reduce fuel consumption and travel time. While previous studies synthesize simulators and model-free DRL (MFDRL), this work proposes a Safe Off-policy Model-Based Reinforcement Learning (SMORL) algorithm for eco-driving. SMORL integrates three key components, namely a computationally efficient model-based trajectory optimizer, a value function learned off-policy and a learned safe set. The advantages over the existing literature are three-fold. First, the combination of off-policy learning and the use of a physics-based model improves the sample efficiency. Second, the training does not require any extrinsic rewarding mechanism for constraint satisfaction. Third, the feasibility of trajectory is guaranteed by using a safe set approximated by deep generative models. The performance of SMORL is benchmarked over 100 trips against a baseline controller representing human drivers, a non-learning-based optimal controller, a previously designed MFDRL strategy, and the wait-and-see optimal solution. In simulation, SMORL reduces the fuel consumption by more than 21% while keeping the average speed comparable while compared to the baseline controller and demonstrates a better fuel economy while driving faster compared to the MFDRL agent and the non-learning-based optimal controller.
Zhu et al. (Tue,) studied this question.