Reliable real-time detection and counting of tuna during distant-water deck operations is critical for automated catch monitoring but remains challenging due to strong illumination variation, background clutter, and frequent occlusion. This study proposes YOLOv10n-EMCNet, an improved lightweight detector based on YOLOv10n, integrating an ESC-based C2f enhancement in the backbone, a Multi-Branch and Scale Modulation-Fusion Feature Pyramid Network (SMFPN) in the neck, and a Convolutional Attention Fusion Module (CAFM) in the head for fine-grained representation and multi-scale feature fusion. An end-to-end detection–tracking–counting pipeline is further constructed by combining the detector with DeepSORT and an ROI-based de-duplication strategy. On the tuna dataset, YOLOv10n-EMCNet achieved 94.84% mAP@0.5, 65.29% mAP@0.5:0.95, and 91.77% recall with 6.5 GFLOPs. In addition, a controlled comparison among DeepSORT, ByteTrack, and OC-SORT on challenging videos showed that DeepSORT provided the best overall balance between counting accuracy, identity stability, and runtime efficiency. In shipboard video validation on four representative videos covering daytime high glare, nighttime low light, dense occlusion, and dense multi-target, the proposed pipeline achieved an average counting accuracy of 91.4%, with an average relative error of 8.62% and an average absolute error of 1.25 fish per video, while operating at approximately 30 FPS on an RTX 4090D platform. These results provide encouraging preliminary evidence that the proposed method can support automated tuna monitoring under representative shipboard conditions.
Liu et al. (Sat,) studied this question.