Plant disease detection using Artificial Intelligence (AI) has become a critical area of research due to its potential to improve crop yields, reduce losses and support sustainable agriculture. Despite significant progress, existing approaches often lack a unified framework that balances accuracy, efficiency and interpretability. Complex models such as Vision Transformers (ViT) and multimodal systems demand high computational resources, while lightweight networks struggle with robustness under data scarcity and domain shift. Addressing these open challenges, we propose Cropper, an ensemble-based framework for agricultural leaf disease classification that integrates structural model diversity, attention-based refinement and interpretability within a computationally efficient architecture. We evaluate Cropper on seven publicly available leaf image datasets, covering both binary and multiclass classification tasks and compare its performance against five state-of-the-art baseline methods, including ViTs. The experimental results demonstrate that Cropper consistently outperforms all baselines across all datasets, achieving statistically significant improvements in five different performance metrics, while substantially reducing computational overhead compared to existing ensemble and large single-backbone architectures. Cropper also demonstrates enhanced robustness under data-scarce conditions and provides interpretable outputs that enhance trustworthiness in decision-making. By addressing the trade-offs between accuracy, efficiency and interpretability, Cropper offers a practical and scalable solution for real-world agricultural AI systems, supporting deployment in resource-constrained environments and advancing the adoption of trustworthy AI in farming practices. • Introduces Cropper: an attention-guided deep ensemble for crop disease detection. • Combines diverse CNN backbones with a projection layer and dual-channel attention. • Uses spatial and channel attention to localize fine-grained disease features. • Outperforms state-of-the-art on four datasets under small, imbalanced conditions. • Incorporates attention-based visualization for interpretability and transparency.
Guo et al. (Sun,) studied this question.