Severe foliage occlusion and dynamically changing lighting conditions in complex orchard environments pose significant challenges for visual perception systems in automated apple harvesting, including low detection accuracy, poor robustness, and insufficient real-time performance. To address these issues, this study proposes an improved lightweight detection network based on YOLOv11, named YOLO-WBL, along with a precise yield estimation algorithm based on 3D point clouds, termed CLV. The YOLO-WBL network is optimized in three aspects: (1) A C3K2WT module integrating wavelet transform is introduced into the backbone network to enhance multi-scale feature extraction capability; (2) A weighted bidirectional feature pyramid network (BiFPN) is adopted in the neck network to improve the efficiency of multi-scale feature fusion; (3) A lightweight shared convolution separated batch normalization detection head (Detect-SCGN) is designed to significantly reduce the parameter count while maintaining accuracy. Based on this detection model, the CLV algorithm deeply integrates depth camera point cloud information through 3D coordinate mapping, irregular point cloud reconstruction, and convex hull volume calculation to achieve accurate estimation of individual fruit volume and total yield. Experimental results demonstrate that: (1) The YOLO-WBL model achieves a precision of 93. 8%, recall of 79. 3%, and mean average precision (mAP@0. 5) of 87. 2% on the apple test set; (2) The model size is only 3. 72 MB, a reduction of 28. 87% compared to the baseline model; (3) When deployed on an NVIDIA Jetson Xavier NX edge device, its inference speed reaches 8. 7 FPS, meeting real-time requirements; (4) In scenarios with an occlusion rate below 40%, the mean absolute percentage error (MAPE) of yield estimation can be controlled within 8%. Experimental validation was conducted using apple images selected from the dataset under varying lighting intensities and fruit occlusion conditions. The results demonstrate that the CLV algorithm significantly outperforms traditional average-weight-based estimation methods. This study provides an efficient, accurate, and deployable visual solution for intelligent apple harvesting and yield estimation in complex orchard environments, offering practical reference value for advancing smart orchard production.
Chen et al. (Sun,) studied this question.