• Lightweight transformer (SGS-DETR) for strawberry growth-stage detection. • CAS enhances multi-scale feature extraction under occlusion and variable illumination. • DASI enables adaptive cross-scale fusion to improve small target detection. • CARAFE preserves spatial detail during upsampling to reduce missed targets. • SGS-DETR cuts parameters by 21.2% and GFLOPs by 5.3 % while improving mAP by 3.4% and F1 by 3.8 % vs. RT-DETR. Accurate detection of strawberry growth stages is vital for optimizing smart agricultural systems, enabling precise resource allocation and effective crop management. This paper proposes SGS-DETR, a lightweight transformer-based model designed for real-time strawberry growth stage detection in greenhouse environments. A total of 2153 images covering all strawberry growth stages were constructed as the experimental dataset, which was split into 1722 training images and 431 testing images at a ratio of 8:2 to ensure objective evaluation. Addressing the increasing demand for intelligent farming solutions capable of handling dense planting patterns and variable environmental conditions, SGS-DETR incorporates advanced deep learning techniques—specifically, a real-time detection transformer—striking a robust balance between detection accuracy and inference speed. The model addresses critical challenges such as elevated false-positive rates and severe occlusion in multi-target greenhouse scenarios prevalent in intelligent agriculture. A comprehensive dataset covering all strawberry growth stages was utilized to train and evaluate the model. Experimental results demonstrate that SGS-DETR achieves high performance across multiple metrics, including a mean average precision (mAP@0.5) of 0.946, an mAP@0.5:0.95 of 0.782, a precision of 0.925, a recall of 0.936, an F1-score of 0.930, and an inference speed of 102.5 FPS, outperforming widely adopted baselines such as Faster R-CNN, the YOLOv series, MobileViT, EfficientViT, and RT-DETR. Furthermore, SGS-DETR maintains high computational efficiency, requiring only 53.9 giga floating-point operations per second (GFLOPs) and a model size of 15.6 MB, thereby ensuring an optimal trade-off between performance and resource utilization. This work offers a promising pathway to enhance the efficiency and sustainability of precision agriculture, strengthen food security, and accelerate the transformation of agricultural practices through AI-driven innovations.
Fan et al. (Wed,) studied this question.