• Genetic basis underlying the parameters in soybean models was uncovered using GWAS. • Candidate sites without functional validation provide new targets for molecular breeding. • A soybean phenology model incorporating genetic effects was built via machine learning. • Machine learning enables low-cost and high-efficiency soybean phenology prediction. Process-based crop growth models effectively simulate phenology across diverse environments but lack the capacity to assimilate genomic information, which limits their utility in breeding. Conversely, genomic prediction models capture additive genetic effects but fail to provide mechanistic insights into the physiological processes underlying phenotypic outcomes. Bridging this gap holds substantial potential for advancing precision breeding and improving phenotypic prediction for complex traits such as crop phenology. Therefore, an integrated framework was developed in this study to combine genotype-specific parameters (GSPs) from process-based models, specifically DSSAT-CROPGRO-Soybean (DCS) and SoybeanGrow (SG), with genomic information. This integration was accomplished via four machine learning algorithms (rrBLUP, LASSO, RF, and XGBoost). The results revealed that more than 95% of the loci associated with GSPs mapped to previously reported soybean flowering-time and growth-regulation QTLs. The remaining loci, which await functional validation, represent candidate targets for molecular breeding. When genetic effects were incorporated into the process-based models, the RMSE for the whole growth period averaged 7.25 days (DCS) and 7.93 days (SG), which was only 2.43 days and 2.71 days higher than those of the original process-based models, respectively, but was achieved with merely 0.01–0.49% of all SNPs. Among the four algorithms tested, rrBLUP delivered the highest predictive accuracy for phenology simulation in genetic-effect models. Notably, the two soybean growth models used differ in their photoperiodic and thermal-response algorithms, yet both benefited from genetic integration. These results demonstrate that incorporating genetic effects into process-based models maintains simulation accuracy while substantially enhancing genetic interpretability, offering an efficient approach for molecular breeding.
Liang et al. (Fri,) studied this question.