Oil content is a critical indicator for evaluating the economic value of Camellia oleifera seeds. Given the insufficient mechanistic interpretation in macroscopic oil content detection and visualization studies, this study aimed to explore the feasibility of using hyperspectral microscope imaging (HMI) to evaluate oil content and visualize its spatial distribution at the cellular level in Camellia oleifera seed kernels. Practical constraints associated with equipment and experimental conditions result in limited sample data, thereby adversely affecting model performance. To address this limitation, Wasserstein generative adversarial network with gradient penalty (WGAN-GP) was employed to augment spectral and oil content data, and qualitative and quantitative methods were used to systematically evaluate the generated data and the real data. Partial least squares regression (PLSR) and convolutional neural network regression (CNNR) models were constructed by adding different proportions of generated data to the original calibration set. The results showed that WGAN-GP outperformed traditional DCGAN in data augmentation and improved the prediction performance of the PLSR and CNNR models, with R p 2 reaching 0.6981 and 0.8220, respectively, representing increases of 31.89% and 13.19%. This study not only provides new microscopic mechanistic analysis for the macroscopic nondestructive detection of Camellia oleifera seeds, but also offers an important reference for the accurate prediction of physicochemical traits using HMI under small sample conditions. • HMI with DL were applied to predict oil content in camellia oleifera seed kernels. • WGAN-GP was improved for use in augment spectral and oil content data. • Visualization displayed the distribution of oil content at the microscopic scale. • Data augmentation significantly improved model performance.
Yuan et al. (Fri,) studied this question.