What does this research mean for the field?

An integrated visual recognition system combining a lightweight YOLO-FES detection model and an M-YOLACT segmentation network enables grape-picking robots to achieve high detection accuracy and an 89.2% harvesting success rate in complex orchard environments. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to improve grape detection and picking-point localization for harvesting robots.

May 16, 2026Open Access

Research on Visual Recognition and Harvesting Point Localization System for Grape-Picking Robots in Smart Agriculture

Key Points

The aim is to improve grape detection and picking-point localization for harvesting robots.
Developed a lightweight YOLOv7-derived model (YOLO-FES) for grape cluster and peduncle detection.
Utilized GJK algorithm and improved YOLACT for segmentation of peduncles.
Employed a stereo depth camera for obtaining two-dimensional and three-dimensional picking-point information.
Achieved mAP@0.5 of 95.37% for grape clusters and peduncles with YOLO-FES.
M-YOLACT achieved mAP@0.5 values of 95.73% and 94.36% for bounding boxes and masks respectively.
Overall harvesting success rate was 89.2% with an average time consumption of 11 seconds per operation.

Abstract

To improve grape target perception and picking-point positioning for intelligent harvesting robots, this study develops a vision-based method for orchard grape detection and harvesting-point localization. The method is intended to address missed detections, insufficient recognition accuracy, and unsatisfactory peduncle segmentation caused by illumination variation, occlusion, and interference from branches and leaves in complex orchard scenes. For grape cluster and peduncle detection, a lightweight YOLOv7-derived model, termed YOLO-FES, was established. In this model, FasterNet and SCConv were introduced to refine the backbone and neck structures, and the EMA mechanism was incorporated to lower parameter complexity and computational cost while improving detection performance. For suspended grape structure association and peduncle extraction, the GJK algorithm was combined with nearest-neighbor rectangular discrimination, and an improved YOLACT-based peduncle segmentation network, named M-YOLACT, was constructed. With the integration of the MLCA mechanism and the Mish activation function, accurate peduncle segmentation was achieved. In addition, a stereo depth camera was employed to obtain two-dimensional picking-point information and further recover the corresponding three-dimensional spatial coordinates. Experimental results showed that the mAP@0.5 of YOLO-FES for grape clusters and peduncles reached 95.37%. For grape peduncle segmentation, the mAP@0.5 values of the bounding boxes and masks produced by M-YOLACT reached 95.73% and 94.36%, respectively. The proposed method achieved an overall harvesting success rate of 89.2%, with an average time consumption of 11 s for a single harvesting operation. By integrating deep-learning-based detection and segmentation with binocular-vision localization, this study provides a practical technical solution and useful reference for the visual system design of grape-harvesting robots.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Lin et al. (Thu,) studied this question.

synapsesocial.com/papers/6a080b27a487c87a6a40d4ce https://doi.org/https://doi.org/10.3390/agriculture16101073

Bookmark

View Full Paper