Fine-grained pest recognition is a key component of intelligent pest monitoring and precise control, and it is important for ensuring agricultural production safety. This paper proposes a generative self-supervised learning-based pest recognition model, termed PAFT-WPest, to address challenges in fine-grained pest recognition, including small inter-class differences, large intra-class variations, complex background interference, and limited annotated data. The model employs partial-convolution spatial attention to focus on pest regions while suppressing redundant background information. Channel semantic selection and frequency-domain modeling are introduced to enhance the model's ability to perceive subtle detail differences. In addition, the model captures dependency relationships among different parts of the pest body to improve the modeling of global structure and semantic information. Furthermore, two fine-grained wolfberry pest datasets that distinguish pest growth stages and damage locations are constructed, and a continual pre-training strategy is adopted to enhance cross-scenario adaptability. Experimental results show that PAFT-WPest achieves accuracies of 76.83%, 91.53%, 98.70%, 79.27%, and 97.34% on the public pest datasets IP102, Butterfly-200, WPIT9K, Rice Pest, and Jute Pest, respectively, and accuracies of 97.82% and 94.69% on the self-built wolfberry pest datasets WP45 and WP11. These results indicate that the proposed model can improve fine-grained pest recognition performance under complex backgrounds, providing a feasible approach for agricultural pest monitoring and classification.
Liu et al. (Sun,) studied this question.