What question did this study set out to answer?

The article investigates the effectiveness of a deep reinforcement learning approach for autonomous underwater navigation in challenging environments.

April 3, 2026Open Access

Deep Reinforcement Learning for Autonomous Underwater Navigation: A Comparative Study with DWA and Digital Twin Validation

Key Points

The article investigates the effectiveness of a deep reinforcement learning approach for autonomous underwater navigation in challenging environments.
Utilized the BlueROV2 for experimental validation
Implemented a deep reinforcement learning algorithm based on Proximal Policy Optimization (PPO)
Compared PPO's navigation performance with the Dynamic Window Approach (DWA)
Conducted evaluations in realistic simulations and validated on a physical robot using a digital twin
PPO policy consistently outperformed DWA in cluttered environments
Demonstrated better local adaptation and reduced collision rates
Confirmed successful behavior transfer from simulated to real-world environments

Abstract

Autonomous navigation in underwater environments is challenged by the absence of GPS, degraded visibility, and submerged obstacles. This article investigates these issues using the BlueROV2, an open platform for scientific experimentation. We propose a deep reinforcement learning approach based on the Proximal Policy Optimization (PPO) algorithm, using an observation space that combines target-oriented navigation information, a virtual occupancy grid, and raycasting along the boundaries of the operational area. This information is encoded into a high-dimensional observation space of 84 dimensions, providing the agent with comprehensive local and global situational awareness. The learned policy is compared against a reference deterministic kinematic planner, the Dynamic Window Approach (DWA), a robust baseline for obstacle avoidance. The evaluation is conducted in a realistic simulation environment and complemented by validation on a physical BlueROV2 supervised by a 3D digital twin of the test site, reducing risks associated with real-world experimentation. The results show that the PPO policy consistently outperforms DWA in highly cluttered environments, notably thanks to better local adaptation and reduced collisions. Finally, experiments demonstrate the transferability of the learned behavior from simulation to the real world, confirming the relevance of deep RL for autonomous navigation in underwater robotics.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper