Abstract Multi-Agent Reinforcement Learning (MARL) has demonstrated significant potential in cooperative trajectory planning for connected and automated vehicles (CAVs), effectively addressing multi-vehicle coordination challenges in complex dynamic environments. However, handling environmental uncertainties and planning safe and efficient trajectories remain critical challenges in this field, particularly when facing unpredictable or rapidly changing traffic scenarios where existing algorithms often exhibit poor generalization and insufficient safety guarantees. Therefore, this paper proposes a TD3-based MARL framework with the following key contributions: First, it models environmental uncertainties as time-varying Gaussian distributions and incorporates them into the reward function design, enabling agents to better quantify risk and uncertainty in their decision-making process. Second, employing dual critic networks, delayed policy updates, and target policy smoothing to enhance system stability and robustness. Experimental results demonstrate that the proposed framework significantly outperforms conventional DDPG approaches in terms of convergence speed, collision avoidance success rate, and adaptability to environmental variations, achieving safe and reliable cooperative trajectory planning in dynamic scenarios with multiple moving obstacles.
Ji et al. (Mon,) studied this question.