Key points are not available for this paper at this time.
Deep reinforcement learning technology has made significant progress in the field of autonomous driving, offering immense potential for future intelligent transportation systems. In mixed scenarios where autonomous vehicles and human-driven vehicles coexist at unsignalized intersections, the irregular or unpredictable behavior of human drivers poses a significant challenge to the decision-making system of autonomous vehicles. To address this issue, this paper describes the passage of autonomous vehicles through unsignalized intersections as a Markov Decision Process (MDP) and employs an improved Proximal Policy Optimization (PPO) method for controlling autonomous vehicles. We introduce a self-attention mechanism as a new input structure in the Actor network of the original PPO model, termed SA-PPO. This input structure more efficiently processes the relationship between the state information of autonomous and human-driven vehicles, including their relative positions, speeds, etc., thereby enabling the model to make more effective decision-making actions. We trained and tested the proposed method on the gym platform and compared it with the original PPO algorithm using multiple evaluation metrics such as cumulative reward, average vehicle speed, and success rate. The results demonstrate the superiority of our proposed method.
Jiang et al. (Fri,) studied this question.