What question did this study set out to answer?

This study aims to improve the longitudinal control of vehicle platooning under challenging driving conditions, such as hard braking and varying friction.

May 21, 2026Open Access

Safety-Filtered Residual Reinforcement Learning over Model Predictive Control for Friction-Aware Autonomous Vehicle Platooning

Key Points

This study aims to improve the longitudinal control of vehicle platooning under challenging driving conditions, such as hard braking and varying friction.
Developed a control architecture incorporating a Kalman filter and a model predictive control backbone.
Integrated bounded proximal policy optimization to refine control commands during transient events.
Evaluated the method using simulations in a CARLA digital twin and hardware-in-the-loop assessments.
The control stack improved spacing regulation and maintained non-amplifying disturbance propagation.
Achieved a reduction in positive tractive energy at the wheels by approximately 12% compared to Manual MPC.
Outperformed a PID-CACC reference with up to an 18% improvement in energy efficiency.

Abstract

This paper presents a deployment-oriented longitudinal platoon-control architecture for connected and autonomous vehicles operating under repeated leader hard-braking, cut-ins, and spatially varying road friction. The proposed stack combines four elements: (i) a lightweight scalar Kalman filter (KF) that smooths a friction-related signal and feeds friction-dependent constraint tightening; (ii) a model predictive control (MPC) backbone whose weights and horizon are selected offline using multi-objective GA/NSGA-II tuning; (iii) a bounded proximal policy optimization (PPO) residual policy, trained with the aid of a learned surrogate model, that refines the MPC command during transient events; and (iv) a command-level safety projection that enforces instantaneous actuation and clearance constraints at the fast control tick. The contribution is therefore not a new MPC formulation or a new reinforcement-learning algorithm in isolation, but an integrated and experimentally characterized control stack that keeps the safety-critical structure explicit while using learning to improve transient behavior. The method is evaluated in a CARLA digital twin of a six-vehicle platoon over a 5 km mixed urban–highway route and is further assessed in hardware-in-the-loop (HIL) on an automotive ECU using a multi-rate ROS 2/AUTOSAR implementation (50 Hz estimation/safety loop, 10 Hz MPC/RL refresh). Across 10 held-out disturbance seeds, the full stack improves spacing regulation, maintains non-amplifying disturbance propagation according to the reported string-stability indices, and reduces a route-normalized positive tractive-energy-at-the-wheels proxy by about 12% relative to Manual MPC and by up to 18% relative to a PID-CACC reference. Because the PID-CACC baseline does not enforce hard constraints and can collide under the tested disturbance suite, the main performance comparison is among collision-free controllers. The friction signal used in CARLA is derived from simulator road-surface annotations before filtering, so the present study should be interpreted as a friction-aware control and integration study rather than a validated onboard friction-estimation result. Likewise, the reported energy metric is an effort proxy and is not a calibrated fuel or battery consumption model.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Allahloh et al. (Sat,) studied this question.

synapsesocial.com/papers/6a0ea188be05d6e3efb60498 https://doi.org/https://doi.org/10.3390/machines14050560

Bookmark

View Full Paper