What question did this study set out to answer?

This research aims to improve underwater detection for Autonomous Underwater Vehicles by integrating image enhancement with detection processes.

June 9, 2026Open Access

E2E-AUD: An End-to-End Adaptive Underwater Detection Framework Integrating Physical Priors and Frequency-Adaptive Learning

Key Points

This research aims to improve underwater detection for Autonomous Underwater Vehicles by integrating image enhancement with detection processes.
Developed the End-to-End Adaptive Underwater Detection framework (E2E-AUD) with a UnitModule for image enhancement and detection optimization.
Introduced linear deformable convolution to adaptively model complex targets.
Utilized Haar wavelet downsampling for effective preservation of boundary and texture details.
Achieved 86.2% mAP50 and 67.8% mAP50-95 on the DUO dataset, outperforming YOLOv12 by 3.0% and 2.7%.
On the URPC dataset, reached 84.3% mAP50 and 50.8% mAP50-95, surpassing LEFEN in localization metrics.
Maintained a real-time inference speed of 21.8 FPS with a computational complexity of 9.4 GFLOPs.

Abstract

Underwater detection is crucial for the autonomous operation of Autonomous Underwater Vehicles (AUVs). However, underwater environments pose significant challenges, including severe image degradation, complex target deformation, and densely distributed small objects. Most existing methods treat image enhancement as an independent preprocessing module and rely on fixed-shape convolution kernels for feature extraction, which often leads to inconsistent optimization objectives and limited capability in handling irregular targets and fine-grained small-object details. To address these issues, we propose an End-to-End Adaptive Underwater Detection framework (E2E-AUD). Specifically, a lightweight image enhancement module, UnitModule, is embedded into the detection network so that enhancement can be jointly optimized with detection and directly serve downstream feature learning. In addition, linear deformable convolution (LDConv) is introduced into the backbone to adaptively model polymorphic targets, while Haar wavelet downsampling (HWD) is adopted to preserve boundary and texture information through frequency-domain analysis. Experiments on the DUO and URPC datasets demonstrate that E2E-AUD achieves superior performance over both general-purpose and underwater-specific detectors. Specifically, on the DUO dataset, our model reaches 86.2% mAP50 and 67.8% mAP50-95, outperforming the recent YOLOv12 by 3.0% and 2.7%, respectively. On the highly turbid URPC dataset, it achieves 84.3% mAP50 and 50.8% mAP50-95, surpassing the competitive underwater-specific detector LEFEN by notable margins in strict localization metrics. Furthermore, E2E-AUD maintains a real-time inference speed of 21.8 FPS with highly constrained computational complexity (9.4 GFLOPs), proving its exceptional balance between detection accuracy and deployment efficiency compared to previous methods.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper