Underwater detection is crucial for the autonomous operation of Autonomous Underwater Vehicles (AUVs). However, underwater environments pose significant challenges, including severe image degradation, complex target deformation, and densely distributed small objects. Most existing methods treat image enhancement as an independent preprocessing module and rely on fixed-shape convolution kernels for feature extraction, which often leads to inconsistent optimization objectives and limited capability in handling irregular targets and fine-grained small-object details. To address these issues, we propose an End-to-End Adaptive Underwater Detection framework (E2E-AUD). Specifically, a lightweight image enhancement module, UnitModule, is embedded into the detection network so that enhancement can be jointly optimized with detection and directly serve downstream feature learning. In addition, linear deformable convolution (LDConv) is introduced into the backbone to adaptively model polymorphic targets, while Haar wavelet downsampling (HWD) is adopted to preserve boundary and texture information through frequency-domain analysis. Experiments on the DUO and URPC datasets demonstrate that E2E-AUD achieves superior performance over both general-purpose and underwater-specific detectors. Specifically, on the DUO dataset, our model reaches 86.2% mAP50 and 67.8% mAP50-95, outperforming the recent YOLOv12 by 3.0% and 2.7%, respectively. On the highly turbid URPC dataset, it achieves 84.3% mAP50 and 50.8% mAP50-95, surpassing the competitive underwater-specific detector LEFEN by notable margins in strict localization metrics. Furthermore, E2E-AUD maintains a real-time inference speed of 21.8 FPS with highly constrained computational complexity (9.4 GFLOPs), proving its exceptional balance between detection accuracy and deployment efficiency compared to previous methods.
Zhou et al. (Sun,) studied this question.