Accurately predicting crop yield and its spatiotemporal variability is crucial for precision agriculture. This study developed a prior knowledge-guided remote sensing yield estimation framework at Youyi Farm in China. Based on multi-source data from 2016 to 2025, a Yield-Formation Key Dataset (YFKD) was constructed by integrating Meteorological, Eco-physiological, Phenological, and Soil features. Combined with Boruta feature selection, MLR (Multiple Linear Regression), RF (Random Forest), and XGBoost (Extreme Gradient Boosting) models were compared, and SHAP (Shapley Additive Explanations) was utilized for spatiotemporal driving force analysis. The results showed that the YFKD-XGBoost model achieved the optimal performance (R2=0.865, RMSE = 1491 kg/ha), improving accuracy by up to 17.7% compared to the baseline model. Global SHAP analysis revealed that Soil Spectral Reflectance provided the highest contribution. Temporally, the period from late July to mid-September (especially mid-August) served as the critical monitoring window. Spatially, based on the area share of the dominant negative SHAP contributor, Meteorological Background was the most widespread limiting factor (34.8% of the constrained area), Soil Conditions constraints showed localized clustering (16.4%), while Phenological and Eco-physiological constraints dominated intra-field spatial differentiation. This study validated the feasibility of this framework for high-precision yield estimation and the analysis of yield formation driving factors under the constraints of a limited regional dataset (n = 233), providing reliable support for regional differentiated agricultural management.
Qi et al. (Thu,) studied this question.