Integrating agricultural remote sensing and phenomics for full-growth-period rice quality prediction is vital for early non-destructive screening and breeding; however, studies integrating genomic and multi-source phenotypic data across multiple environments remain limited. This study addressed this gap by integrating genomic SNP data, UAV-based spectral data, and individual multidimensional phenotypic data of 61 indica rice varieties (field and greenhouse environments). As a proof-of-concept study, feature selection methods (LASSO, MI, RFE, SPA) were used to mitigate overfitting and the “p >> n” problem, with further validation needed in larger populations. The results showed that amylose content is genetically dominated, protein content is genetically determined and influenced by gene-environment interactions, and chalkiness traits are determined by three combined factors. For amylose content, SNP data under the Random Forest model at the population level (phenomics data from field UAV remote sensing of variety populations) achieved optimal performance (R2 = 0.92; MAE = 1.1; RMSE = 1.5), while the Stacking Ensemble method enhanced accuracy at the individual level (phenomics data from greenhouse single-plant phenotyping per variety). Chalky grain rate and chalkiness degree showed SNP-comparable prediction accuracy, with Stacking significantly improving performance at the population level (R2 = 0.89 and 0.85, respectively). Protein content prediction remained relatively low (optimal R2 = 0.56) due to strong environmental sensitivity and complex interactions. This framework extends traditional single-environment/single-data-source approaches, providing an effective strategy for early, high-throughput, non-destructive rice quality screening. Further validation with larger datasets, more growing seasons, or independent populations is required for reliable application in breeding-related practices.
Zhang et al. (Sat,) studied this question.