Accurate detection of tomato fruits is a critical component in vision-guided robotic harvesting systems, which play an increasingly important role in automated agriculture. However, this task is challenged by variable lighting conditions and background clutter in natural environments. In addition, the arbitrary orientations of fruits reduce the effectiveness of traditional horizontal bounding boxes. To address these challenges, we propose a novel object detection framework named SN-YOLO. First, we introduce the StarNet’ backbone to enhance the extraction of fine-grained features, thereby improving the detection performance in cluttered backgrounds. Second, we design a Color-Prior Spatial-Channel Attention (CPSCA) module that incorporates red-channel priors to strengthen the model’s focus on salient fruit regions. Third, we implement a multi-level attention fusion strategy to promote effective feature integration across different layers, enhancing background suppression and object discrimination. Furthermore, oriented bounding boxes improve localization precision by better aligning with the actual fruit shapes and poses. Experiments conducted on a custom tomato dataset demonstrate that SN-YOLO outperforms the baseline YOLOv8 OBB, achieving a 1.0% improvement in precision and a 0.8% increase in mAP@0.5. These results confirm the robustness and accuracy of the proposed method under complex field conditions. Overall, SN-YOLO provides a practical and efficient solution for fruit detection in automated harvesting systems, contributing to the deployment of computer vision techniques in smart agriculture.
Chen et al. (Fri,) studied this question.