What question did this study set out to answer?

The study aims to enhance the classification of apple leaf diseases using a hybrid CNN-ViT framework.

April 23, 2026Open Access

Cross-model feature fusion and weighted attention mechanism for apple leaf disease classification

Key Points

The study aims to enhance the classification of apple leaf diseases using a hybrid CNN-ViT framework.
Developed a hybrid model named ResViT-AM that integrates CNN and vision transformer features.
Implemented a weighted attention mechanism for adaptive feature integration.
Evaluated the model on the public AppleLeaf dataset under real-world conditions.
Achieved a top-1 accuracy of 99.14% on the AppleLeaf test split.
Enhanced stability and adaptability compared to traditional CNN models on challenging cases.
Demonstrated significant improvements in classification performance under complex orchard conditions.

Abstract

Apple leaf diseases (ALD) pose a significant challenge to global apple production, and accurately identifying ALD is crucial for reducing pesticide use and improving fruit quality, particularly in the context of smart agriculture. However, traditional approaches rely on single-model feature extraction, failing to account for relationships between different tasks, which limits their applicability in the apple industry. To address this, we design an optimized convolutional neural network–vision transformer (CNN–ViT) hybrid framework named ResViT-AM, focusing on domain-specific enhancement rather than architectural novelty. Instead of proposing a completely new structure, this work refines existing CNN–Transformer paradigms through task-oriented feature fusion and adaptive attention weighting, tailored for apple leaf disease classification under complex orchard conditions. Using a weighted attention fusion mechanism, our model dynamically integrates features extracted by Residual Network 101 (ResNet-101) and vision transformer (ViT), combining proven architectures in a task-adaptive way rather than pursuing architectural innovation, blending the local convolutional details of Residual Network (ResNet) with the global contextual features of ViT. This approach enhances the model’s representation capability and allows parallel processing of multiple tasks, thereby saving training time and computational resources. Additionally, we evaluate on the public AppleLeaf dataset, which reflects real-world outdoor conditions. On its held-out test split, our model achieves 99.14% top-1 accuracy on the AppleLeaf test split, indicating promising performance under complex orchard conditions. Compared with representative convolutional baselines, ResViT-AM shows greater stability and adaptability on challenging cases, offering a competitive and practical solution for automated apple leaf disease diagnosis.

Bookmark

View Full Paper

Bookmark

View Full Paper

Cross-model feature fusion and weighted attention mechanism for apple leaf disease classification

Key Points

Abstract

Cite This Study