What question did this study set out to answer?

The aim is to enhance underwater object detection using joint training of RGB and sonar images without pairing.

February 12, 2026Open Access

Dual-Modal Vision–Sonar Object Detection for Underwater Robots Based on Deep Learning

Key Points

The aim is to enhance underwater object detection using joint training of RGB and sonar images without pairing.
Developed a paired-sample-free RGB-FLS joint training paradigm based on parameter sharing
Utilized RGB and FLS images from different datasets during training
Conducted experiments under six underwater degradation factors including contrast loss and blur
Maintained original detector parameter scale and inference cost
Achieved an improvement of over 14 percentage points in mAP50 under severe low-contrast conditions compared to RGB-only training
Demonstrated consistent robustness gains over RGB-only training
Indicated that sonar-domain supervision aids in optimizing detector performance

Abstract

Applying state-of-the-art RGB object detectors (e.g., YOLOv8) to underwater scenes often yields unstable performance due to scattering, absorption, illumination deficiency, and bandwidth-limited transmission that severely corrupt image contrast and details. Forward-looking sonar (FLS) remains informative in turbid or low-visibility water, yet its low resolution and weak semantics make conventional fusion architectures costly and difficult to deploy on resource-constrained robots. This paper proposes a paired-sample-free RGB–FLS joint training paradigm based on parameter sharing, where RGB and FLS images from different datasets are jointly used during training without any frame-level pairing or architectural modification. The resulting model preserves the original detector parameter scale and inference cost, and requires only RGB input at test time. Experiments on the SeaClear and Marine Debris FLS datasets under six representative underwater degradation factors (contrast loss, blur, resolution reduction, color cast, and JPEG compression) show consistent robustness gains over RGB-only training. In particular, under severe low-contrast corruption, the proposed training strategy improves mAP50 by more than 14 percentage points compared with the RGB-only baseline. These results indicate that sonar-domain supervision functions as an auxiliary structural constraint during optimization, rather than a conventional multi-source data enlargement. By forcing a shared-parameter detector to fit a texture-poor, geometry-dominant sonar domain, the learned representation is biased away from color/texture shortcuts and becomes more stable under adverse underwater degradations, without increasing deployment complexity.

Bookmark

View Full Paper

Bookmark

View Full Paper

Dual-Modal Vision–Sonar Object Detection for Underwater Robots Based on Deep Learning

Key Points

Abstract

Cite This Study