Banana leaf diseases such as Cordana, Sigatoka, and Pestalotiopsis significantly reduce crop yield and quality, necessitating accurate and early detection for effective management. This study proposes ResViT HybridNet, a novel deep learning framework that integrates ResNet50 for spatial feature extraction and a Vision Transformer (ViT) for global context modelling, bridged by a Hybrid Pool Block (HPB) to preserve spatial locality. Using a dataset of 2537 images across four classes, the model achieved an overall accuracy of 99.21%, precision of 0.9843, recall of 0.9921, and F1-score of 0.9881, outperforming conventional CNN and transformer-based models. Extensive ablation and statistical significance analyses confirm the complementary synergy between CNN and transformer components. These results demonstrate that ResViT HybridNet provides a robust and accurate solution for automatic banana leaf disease identification, offering strong potential for deployment in real-world agricultural disease monitoring and crop management systems.
Kant et al. (Tue,) studied this question.