The aim of this study is to determine the reliability of regular and spatial cross-validation methods in predicting subfield-scale maize yields using phenological measures derived by Sentinel-2. Three maize fields from eastern Croatia were monitored during the 2023 growing season, with high-resolution ground truth yield data collected using combine harvester sensors. Sentinel-2 time series were used to compute two vegetation indices, Enhanced Vegetation Index (EVI) and Wide Dynamic Range Vegetation Index (WDRVI). These features served as inputs for three machine learning models, including Random Forest (RF) and Bayesian Generalized Linear Model (BGLM), which were trained and evaluated using both regular and spatial 10-fold cross-validation. Results showed that spatial cross-validation produced a more realistic and conservative estimate of the performance of the model, while regular cross-validation overestimated predictive accuracy systematically because of spatial dependence among the samples. EVI-based models were more reliable than WDRVI, generating more accurate phenomenological fits and yield predictions across parcels. These results emphasize the importance of spatially explicit validation for subfield yield modeling and suggest that overlooking spatial structure can lead to misleading conclusions about model accuracy and generalizability.
Radočaj et al. (Thu,) studied this question.