The study was carried out to develop an interpretable machine learning model based on explainable artificial intelligence to predict spring wheat yield. The study used data of a long-term field experiment (2001–2024) from the forest-steppe zone of the Altai Ob region. The experimental design evaluated the effects of preceding crops, primary tillage practices on leached chernozem soil, and levels of mineral fertilization and chemical plant protection on the yield formation of spring soft wheat. Yield prediction was performed using Extreme Gradient Boosting (XGBoost); SHapley Additive exPlanations (SHAP) were applied to interpret the model and quantify the contribution of each feature. The constructed XGBoost model has demonstrated high predictive accuracy (R2 = 0.95, MAE = 0.13 t/ha, RMSE = 0.17 t/ha), and integration with SHAP analysis has identified the most significant features (5–6 out of the 18 features) determining crop yield in the forest-steppe zone of the Altai Ob region. High predicted yield was primarily associated with sufficient precipitation during the agricultural year (596.5 mm; +1.19 t/ha), the use of fallow as a preceding crop (+0.58 t/ha), and application of nitrogen–phosphorus fertilizers (+0.21 t/ha). Low predicted yield resulted from moisture deficit during the agricultural year (317 mm; –0.77 t/ha) and during May–October (246 mm; –0.24 t/ha), as well as from high sums of positive temperatures (2527.5°C; –0.13 t/ha), low precipitation amount during the growing season (175 mm; –0.10 t/ha), and the absence of plant protection measures (–0.10 t/ha). The proposed approach enhances the practical applicability of machine learning by improving the reliability and interpretability of yield forecasts.
Kalichkin et al. (Sun,) studied this question.