What question did this study set out to answer?

This research aims to improve the classification accuracy of rice pests using a combined YOLOv5 and Transformer model.

April 7, 2026Open Access

Research on Rice Pest Detection and Classification Based on YOLOv5 and Transformer Combination

Key Points

This research aims to improve the classification accuracy of rice pests using a combined YOLOv5 and Transformer model.
Developed ViT-YOLOv5p model integrating YOLO and Transformer for pest detection and classification.
Expanded training samples through data augmentation and introduced noise data for better robustness.
Implemented image cutting and stitching strategies to enhance detection accuracy for small pests.
Trained and evaluated the proposed model against YOLOv5, Faster R-CNN, and YOLOv4 using the same dataset.
ViT-YOLOv5p achieved a classification accuracy of 91.89%, the highest among compared models.
Average detection time (ADT) recorded at 50.41 ms.
Improved accuracy by 1.50% over Faster R-CNN, 8.71% over YOLOv5, and 9.74% over YOLOv4.

Abstract

The significant differences in insects trapped by pest detection lamps lead to low classification accuracy of existing models for rice pests. To address this issue, this paper proposes a small pest target detection and classification model (ViT-YOLOv5p) by integrating the YOLO backbone and Transformer module. First, the number of training samples is expanded through data augmentation during model training. Furthermore, appropriate noise data are introduced to enhance the robustness and generalization ability of the model. Before detection and classification, image cutting and stitching strategies are adopted to improve the detection accuracy of small objects. The bounding box of the pest is determined by the YOLO backbone, and the corresponding region is fed into the Transformer model to obtain the classification result. Finally, YOLOv5, Faster R-CNN, YOLOv4, and the proposed ViT-YOLOv5p are trained on the same dataset, with average detection time (ADT) and classification accuracy employed as evaluative metrics. The results show that ViT-YOLOv5p achieves the highest classification accuracy of 91.89% with an ADT of 50.41 ms. Compared with the commonly used Faster R-CNN, YOLOv5, and YOLOv4 models, the accuracy is improved by 1.50%, 8.71%, and 9.74%, respectively. This study provides a reference for agricultural pest detection, automatic insect classification systems, and deep learning-based detection of small agricultural targets.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Yang et al. (Fri,) studied this question.

synapsesocial.com/papers/69d49fe5b33cc4c35a2285c5 https://doi.org/https://doi.org/10.3390/agriengineering8040138

Bookmark

View Full Paper