The significant differences in insects trapped by pest detection lamps lead to low classification accuracy of existing models for rice pests. To address this issue, this paper proposes a small pest target detection and classification model (ViT-YOLOv5p) by integrating the YOLO backbone and Transformer module. First, the number of training samples is expanded through data augmentation during model training. Furthermore, appropriate noise data are introduced to enhance the robustness and generalization ability of the model. Before detection and classification, image cutting and stitching strategies are adopted to improve the detection accuracy of small objects. The bounding box of the pest is determined by the YOLO backbone, and the corresponding region is fed into the Transformer model to obtain the classification result. Finally, YOLOv5, Faster R-CNN, YOLOv4, and the proposed ViT-YOLOv5p are trained on the same dataset, with average detection time (ADT) and classification accuracy employed as evaluative metrics. The results show that ViT-YOLOv5p achieves the highest classification accuracy of 91.89% with an ADT of 50.41 ms. Compared with the commonly used Faster R-CNN, YOLOv5, and YOLOv4 models, the accuracy is improved by 1.50%, 8.71%, and 9.74%, respectively. This study provides a reference for agricultural pest detection, automatic insect classification systems, and deep learning-based detection of small agricultural targets.
Yang et al. (Fri,) studied this question.