Key points are not available for this paper at this time.
Deep reinforcement learning has been extensively studied for traffic signal control owing to its ability of processing large amounts of information and achieving superior performance control. However, this method acquires flow-specific policies during learning. Thus, its performance under unexperienced traffic flows is not guaranteed. Moreover, the traffic signal control problem formulation assumes that the optimal policy differs for each traffic flow ratio owing to the trade-off between orthogonal roads at an intersection. Therefore, multiple policies must be switched to avoid performance decay with respect to traffic flow changes. In this study, we use multi-objective reinforcement learning to exhaustively determine the policy corresponding to each traffic flow ratio. Subsequently, these policies are switched to the current traffic flow ratio to achieve flexible control over traffic flow changes. The proposed method achieves the shortest average travel times in all environments compared with rule-based and single-objective reinforcement learning methods for stationary traffic and traffic flows with varying flow ratios.
Saiki et al. (Sun,) studied this question.