Plant diseases pose significant challenges to global agricultural productivity, requiring early and accurate detection to mitigate their impact. This study introduces a Vision Transformer (ViT)-based framework for classifying apple leaf diseases, utilizing a novel multi-patch selection approach to strike a balance between feature extraction and computational efficiency. Leaf images were preprocessed to ensure compatibility with the ViT framework and divided into patches of varying sizes (32 × 32, 16 × 16, and 8 × 8), enabling the model to capture both local and global features for classifying four categories: Apple Scab, Black Rot, Cedar Rust, and Healthy Leaves. The proposed Plant Leaf Disease Vision Transformer (PLD-ViT) framework achieves superior classification performance with 97.76% validation accuracy, 95.34% precision, 95.11% recall, and 95.06% F1-score using 16 × 16 patch configuration, significantly outperforming ResNet-50 (96.75% validation accuracy, 95.11% precision, 93.42% recall, 95.31% F1-score) and Swin Transformer (96.79% validation accuracy, 95.76% precision, 95.94% recall, 95.89% F1-score) while maintaining computational efficiency (1.0× GFLOPs vs 3.8× and 4.5× respectively). The model robustly classifies distinct categories but faces challenges distinguishing visually similar groups, such as Cedar Rust and Healthy Leaves. Despite its strengths, the model has limitations, including its reliance on controlled datasets and the computational demands associated with smaller patch sizes. This ViT-based framework offers a practical and scalable solution for precision agriculture, laying the groundwork for accessible tools that support sustainable farming practices and promote global food security.
Batool et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: