Real-time detection of tomato ripeness in complex greenhouse environments presents a significant dual challenge: the interference caused by foliage occlusion and fruit overlapping demands high detection accuracy, while the limited computational resources of harvesting robots necessitate model lightweighting. To address this, we propose YOLO-FLBM, a lightweight, high-performance model based on the enhanced YOLOv8s architecture. First, the backbone network was reconstructed using FasterNet to minimize redundancy, establishing a streamlined foundation for edge deployment. Second, an innovative neck architecture, designated as the LB Neck, was constructed by integrating the C2f-LS module with the BiFPN structure. Crucially, a novel Multi-scale Coordinate Dynamic Attention (MCDA) mechanism was developed. By integrating hybrid perception pooling with full-rank kernel generation, MCDA dynamically captures spatial dependencies to resolve occlusion issues. Experimental results on a custom tomato dataset demonstrated that YOLO-FLBM achieved comprehensive performance enhancements: precision, recall, mAP@50, and mAP@50–95 reached 95.2%, 91.9%, 97.4%, and 78.9%, respectively, representing improvements of 3.7%, 2.5%, 1.9%, and 1.7% over the baseline model. Meanwhile, the model’s parameter count was reduced to 3.743 M, a substantial 61.9% reduction compared to the original model. These results confirm the model’s efficiency and accuracy, offering a valuable reference for automated tomato harvesting robots.
Su et al. (Wed,) studied this question.