What question did this study set out to answer?

This study aims to improve crop yield prediction during droughts by optimizing machine learning models and assessing their transferability across different regions.

February 28, 2026Open Access

Predictive drivers and transferability of multi-scale machine learning based crop yield prediction under drought across European and Asian climates

Key Points

This study aims to improve crop yield prediction during droughts by optimizing machine learning models and assessing their transferability across different regions.
Optimized a Random Forest regressor with two training strategies: random and hydrologically balanced.
Applied models in Bundelkhand, India, and Germany for wheat and barley crops.
Evaluated predictions at two spatial scales: lumped and distributed.
Analyzed spatial transferability by applying local models across different sites.
In Bundelkhand, the balanced approach improved RMSE by 14.66% and MAPE by 29.23% for wheat, and 36.10% (RMSE) and 40% (MAPE) for barley.
In Germany, the balanced training improved RMSE by 4.59% and MAPE by 6.37% for wheat, but decreased performance for barley.
NDVI (Normalized Difference Vegetation Index) was identified as the most important predictor across locations and scales.
Adding local features during retraining may enhance model performance in new regions.

Abstract

Food security is increasingly critical due to a growing global population. Accurate crop yield prediction enables policymakers to anticipate climate shocks, such as droughts, and manage food reserves proactively. Machine learning models struggle to capture the complexity of real-world conditions, particularly during extreme events such as droughts, as these events are often underrepresented in the training data distributions, challenging predictions. This study addresses the limitation of predictability of crop yield during hydrological extremes, particularly droughts, by optimizing a Random Forest regressor using two training strategies: random and a hydrologically balanced approach. The model was applied in the Bundelkhand region in India and Germany in Europe, for wheat and barley crops. Predictions were at two spatial scales: lumped and distributed, to assess the differences in predictive drivers across scales. In Bundelkhand, for the lumped scale, the balanced approach improved Root Mean Square Error (RMSE) by 14.66 % and Mean Absolute Percentage Error (MAPE) by 29.23 % for wheat; and by 36.10 % (RMSE) and 40 % (MAPE) for barley. In Germany, the performance gains from balancing were smaller: 4.59 % (RMSE) and 6.37 % (MAPE) for wheat, with a decrease in performance for barley. The study also analyzed the spatial transferability by applying regionally trained models across sites without retraining. Results suggest that adding local features during retraining may enhance model performance in new regions. Overall, this work demonstrates the value of combining reanalysis data with ensemble machine learning for accurate crop yield prediction, offering insights to support food reserve planning and mitigate the impacts of climate extremes on food security. Predictive drivers and transferability of multi-scale machine learning based crop yield prediction under drought across European and Asian climates • Annual crop yields predicted using climate reanalysis data and random forests. • Multiple training strategies, agroclimatic conditions, and spatial scales are evaluated. • Training strategy with balanced drought incidents in training data is optimal. • NDVI is the most important predictor of crop yield across locations and scales. • Spatial transferability assessment indicates a need for model retraining.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper