Key points are not available for this paper at this time.
Accurate identification of tomato lateral shoots is essential for automated pruning and plant monitoring in greenhouse production. However, complex illumination, leaf occlusion, and morphological variability often reduce detection reliability in optical vision systems. This study proposes an optical vision-based framework that integrates deep learning perception with large language model assisted pruning decision support. A tomato lateral Shoot image dataset was constructed using RGB imaging in greenhouse environments. A lightweight YOLOv8n instance segmentation model with the Convolutional Block Attention Module (CBAM) was developed to enhance feature representation. Data augmentation strategies were applied to simulate illumination variations and improve model robustness. Model interpretability was analyzed using Principal Component Analysis (PCA) and Gradient weighted Class Activation Mapping (Grad CAM). Experimental results show that the proposed YOLOv8n-seg+CBAM model achieves a mAP 0.5 of 98.1% with only 3.28M parameters and an average inference time of 8.0 ms per image. Monte Carlo Dropout was further introduced to estimate the spatial uncertainty of cutting points. These structured perception features were provided to a large language model (LLM), enabling context aware pruning decision assistance. The proposed framework integrates vision-based shoot detection, uncertainty estimation, and LLM-assisted reasoning into a unified pipeline, enabling more reliable pruning decisions and improving safety and robustness compared with vision-only approaches in greenhouse environments.
Jiang et al. (Tue,) studied this question.