What question did this study set out to answer?

This research aims to develop a two-layer control framework for effective navigation of unmanned underwater vehicles.

February 28, 2026Open Access

Autonomous Navigation of an Unmanned Underwater Vehicle via Safe Reinforcement Learning and Active Disturbance Rejection Control

Key Points

This research aims to develop a two-layer control framework for effective navigation of unmanned underwater vehicles.
Utilized a lower-layer active disturbance rejection controller for tracking and disturbance management.
Integrated a twin delayed deep deterministic policy gradient algorithm with a safety filter and reward shaping.
Conducted simulation studies focused on velocity control and obstacle-avoidance navigation.
ADRC provided superior tracking and disturbance rejection compared to conventional PID controllers.
The TD3 + QP + SR framework resulted in faster learning and smoother trajectories.
Overall improvements in safety performance were observed with the proposed method.

Abstract

A two-layer control framework for unmanned underwater vehicle (UUV) navigation is proposed, combining a lower-layer active disturbance rejection controller (ADRC) with an upper-layer safe reinforcement learning (RL) policy for obstacle-avoidance navigation. The lower layer, utilizing ADRC, ensures high tracking accuracy and effective disturbance rejection, while the upper layer integrates the twin delayed deep deterministic policy gradient (TD3) algorithm, combined with a control barrier function (CBF)-based quadratic programming (QP) safety filter and safety-inspired reward shaping (SR). The method is evaluated in two simulation studies: (i) velocity and attitude control to assess tracking and disturbance rejection, and (ii) obstacle-avoidance navigation to assess learning efficiency, trajectory smoothness, and safety-related metrics. Simulation results show that ADRC achieves faster tracking and stronger disturbance rejection than a conventional proportional–integral–derivative (PID) controller. Moreover, the proposed TD3 + QP + SR scheme exhibits faster learning, smoother trajectories, and improved safety performance compared with RL baselines. These results indicate that the proposed framework enables efficient and safe UUV navigation in simulation scenarios with obstacles and disturbances.

问 AI

Bookmark

View Full Paper