Abstract Purpose Accurate and early prediction of crop yield is essential for agricultural management, economic planning, and market stability, particularly for high-value products such as wine. This study presents a multi-temporal modeling framework to predict grapevine yield (in kilograms) in the province of Cádiz, Spain, a region with a deep-rooted winemaking heritage. Methods Using a 12-year dataset that includes historical harvest records, meteorological variables, and time series of remotely sensed vegetation indices, Machine Learning (ML) regression models were developed and evaluated at three key phenological stages: post-harvest (December), post-dormancy (March), and post-flowering (June). The methodology employs a rigorous Leave-One-Year-Out (LOYO) cross-validation approach to assess model performance in the context of short time series. Results Results show a progressive and significant improvement in prediction accuracy throughout the growing season: the Mean Absolute Percentage Error (MAPE) decreases from 13.9% in the early December forecast, to 13.8% in March, reaching 10.9% in June. A sequential agro-physiological narrative is highlighted: the previous year's yield and summer precipitation establish a baseline yield potential, which is subsequently refined by winter temperatures that are identified in March as a key driver of high importance. Far from being a mere forecasting tool, the models also provide valuable explanatory insights, quantifying the influence of phenological and climatic factors at each stage. Conclusion This work demonstrates that high predictability can be achieved well before harvest, offering critical information for strategic decision-making in viticulture under Mediterranean climate conditions.
Cubillas et al. (Sat,) studied this question.