At present, fruit picking mainly relies on manual operation. Taking the litchi (litchi chinensis Sonn.)-picking robot as an example, visual perception is often affected by illumination variations, low recognition accuracy, complex maturity judgment, and occlusion, which lead to inaccurate fruit localization. This study aims to establish an embodied perception mechanism based on “perception-reasoning-execution” to enhance the visual perception and decision-making capability of the robot in complex orchard environments. First, a Y-LitchiC instance segmentation method is proposed to achieve high-precision segmentation of litchi clusters. Second, a generative artificial intelligence model is introduced to intelligently assess fruit maturity and occlusion, providing auxiliary support for automatic picking. Based on the auxiliary judgments provided by the generative AI model, two types of dynamic harvesting decisions are formulated for subsequent operations. For unoccluded main fruit-bearing branches, a skeleton thinning algorithm is applied within the segmented region to extract the skeleton line, and the midpoint of the skeleton is used to perform the first type of localization and harvesting decision. In contrast, for main fruit-bearing branches occluded by leaves, threshold-based segmentation combined with maximum connected component extraction is employed to obtain the target region, followed by skeleton thinning, thereby completing the second type of dynamic picking decision. Experimental results show that the Y-LitchiC model improves the mean average precision (mAP) by 1.6% compared with the YOLOv11s-seg model, achieving higher accuracy in litchi cluster segmentation and recognition. The generative artificial intelligence model provides higher-level reasoning and decision-making capabilities for automatic picking. Overall, the proposed embodied perception mechanism and dynamic picking strategies effectively enhance the autonomous perception and decision-making of the picking robot in complex orchard environments, providing a reliable theoretical basis and technical support for accurate fruit localization and precision picking.
Zhou et al. (Mon,) studied this question.