Machine learning (ML) models with stochastic and non-deterministic characteristics are increasingly used for genomic prediction in plant breeding, but evaluation often neglects important aspects like prediction stability and ranking performance. This study addresses this gap by evaluating how two hyperparameters of a Gradient Boosting Machine (GBM), learning rate (v) and boosting rounds (ntrees), impact stability and multi-metric predictive performance for cross-season, cross-environment prediction in a MAGIC wheat population. Using a grid search of 36 parameter combinations, we evaluated four agronomic traits with five metrics: Pearson's r, Area Under the Curve (AUC), Normalized Discounted Cumulative Gain (NDCG), and the Intraclass Correlation Coefficient (ICC) and Fleiss' κ for stability. Our findings show that a low learning rate combined with a high number of boosting rounds substantially improves prediction stability (ICC > 0.98) and selection stability (Fleiss' κ > 0.80), while reducing train-test performance gaps. This combination produced concurrent improvements for predictive accuracy (r), classification accuracy (AUC) and ranking efficiency (NDCG), though optimal settings were trait-dependent. Despite moderate Pearson's r in this challenging cross-season, cross-environment prediction scenario, NDCG remained high (> 0.85), indicating strong ability to rank top-performing entries. In benchmark comparisons conducted within this stump-based additive GBM setting, selected GBM configurations were broadly comparable to rrBLUP, with modest trait-dependent differences across metrics. Ultimately, prioritizing stability when tuning GBMs effectively yields reproducible cross-environment predictions with improved accuracy and top-end ranking performance.
Munroe et al. (Sat,) studied this question.