Key points are not available for this paper at this time.
This review examines reinforcement learning (RL) methods for dynamic speed control in connected and autonomous vehicle (CAV) environments, covering variable speed limits, platooning, and speed harmonization. Focusing on studies from 2017 to 2025, it analyzes algorithmic choices (value-based, policy-gradient, actor-critic, and multi-agent RL), state-action design, and reward engineering, as well as deployment assumptions on communication, penetration rates, and mixed traffic. Simulation results generally indicate improvements in safety (≈8%-50%), traffic efficiency (≈7%-57%), fuel consumption (≈6%-20%), and throughput (≈12%-30%), with multi-agent approaches performing more robustly at moderate CAV penetration (30%-50%). However, benefits are highly scenario dependent and often rely on idealized communication, limited fleet sizes, and non-standardized evaluation. Real-world tests remain scarce and consistently underperform their simulated counterparts, highlighting a significant sim-to-real gap. The review identifies key research priorities in scalable multi-agent-RL (MARL) architectures, safety-constrained learning, robust sim-to-real transfer, and standardized benchmarking to support deployment-oriented adoption of RL-based speed control in future CAV-enabled traffic systems.
Alhmiedat et al. (Tue,) studied this question.