The critical task of ship detection in aerial imagery for maritime monitoring faces significant challenges in achieving real-time performance on embedded platforms. These challenges arise from the large data volume inherent in wide-format aerial images and the pronounced scale variations among vessels. To address this issue, an optimized YOLOv8-based model is proposed. Scale adaptability is enhanced by incorporating a Multi-Scale Fusion (MSF) module into the backbone. In addition, a lightweight Group-Wise Scale Fusion Neck (GSF-Neck) with a parallel multi-branch structure is designed to facilitate adaptive multi-scale feature fusion while reducing computational overhead. The proposed model achieves a state-of-the-art mAP@0.5 of up to 94.55% on a dedicated aerial ship dataset, outperforming other major detectors. When deployed on an RK3588 embedded system using a sliding window strategy to process single 300 MB images, it maintains a stable processing speed of ≥2 fps. Compared to the baseline under identical conditions, the model proposed in this study improves mAP by 1.4% with a 6.6% reduction in FPS, effectively balancing detection performance and computational efficiency.
Liu et al. (Thu,) studied this question.