Applying state-of-the-art RGB object detectors (e.g., YOLOv8) to underwater scenes often yields unstable performance due to scattering, absorption, illumination deficiency, and bandwidth-limited transmission that severely corrupt image contrast and details. Forward-looking sonar (FLS) remains informative in turbid or low-visibility water, yet its low resolution and weak semantics make conventional fusion architectures costly and difficult to deploy on resource-constrained robots. This paper proposes a paired-sample-free RGB–FLS joint training paradigm based on parameter sharing, where RGB and FLS images from different datasets are jointly used during training without any frame-level pairing or architectural modification. The resulting model preserves the original detector parameter scale and inference cost, and requires only RGB input at test time. Experiments on the SeaClear and Marine Debris FLS datasets under six representative underwater degradation factors (contrast loss, blur, resolution reduction, color cast, and JPEG compression) show consistent robustness gains over RGB-only training. In particular, under severe low-contrast corruption, the proposed training strategy improves mAP50 by more than 14 percentage points compared with the RGB-only baseline. These results indicate that sonar-domain supervision functions as an auxiliary structural constraint during optimization, rather than a conventional multi-source data enlargement. By forcing a shared-parameter detector to fit a texture-poor, geometry-dominant sonar domain, the learned representation is biased away from color/texture shortcuts and becomes more stable under adverse underwater degradations, without increasing deployment complexity.
Wang et al. (Tue,) studied this question.