This paper presents a machine learning-driven framework for analyzing and predicting potato late blight (caused by Phytophthora infestans) across two distinct cultivation systems-ecological and integrated-using six potato varieties. Traditional statistical methods, including a two-factor Analysis of Variance (ANOVA) and Tukey?s Honest Significant Difference (HSD) test, were applied to assess the effects of cultivation systems, potato varieties, and year. To enhance predictive accuracy and model interpretability, an advanced machine learning pipeline, termed the Eco-Integrated Model, was developed. This model integrates SMOTE (Synthetic Minority Oversampling Technique) for handling class imbalance, SHAP (SHapley Additive xPlanations) for interpretability and feature importance analysis, and the CatBoost classifier for robust, high-performance prediction. The dataset, collected over three years (2018-2020), includes multi-varietal and system-specific records of late blight incidence for both ecological integrated-based data, serving as input for model training and evaluation. The proposed Eco-Integrated Model demonstrated high predictive capability, revealing that integrated cultivation systems are generally more effective at suppressing disease progression. Moreover, substantial varietal differences were identified in late blight susceptibility, as highlighted by both statistical and machine learning analyses. These findings underline the value of incorporating explainable, data-driven approaches into plant disease forecasting. The Eco-Integrated Model offers a scalable, interpretable, and accurate predictive solution, contributing to precision agriculture practices and supporting evidence-based decision-making for sustainable potato production and disease management strategies.
Bagchi et al. (Thu,) studied this question.