This study aims to expand the application of handheld near-infrared spectroscopy combined with machine learning in quantitative analysis of Epimedium . After establishing and evaluating several promising models: Gradient Boosting Regression (GBR), Extra Trees (ET), Extreme Gradient Boosting (eXGB), Support Vector Regression (SVR), Random Forest (RF), Partial Least Squares Regression (PLSR), we selected PLSR and SVR for further optimization due to their excellent performance in terms of R 2 and RMSE. Employing the NIPPY library for data preprocessing and a genetic algorithm for feature selection, models for predicting the contents of four active components in Epimedium were successfully established. The GA-SVR hybrid model delivered the following predictive performance for the four target components: icariin (R 2 :0.9556, RMSE:0.0031, Avg. MAPE:2.07%), epimedin A (R 2 :.9586, RMSE:0.0004, Avg. MAPE:3.60%), epimedin B (R 2 :0.9432, RMSE:0.0003, Avg. MAPE:3.58%) and epimedin C (R 2 :0.9232, RMSE:0.0182, Avg. MAPE:2.33%). These findings establish a novel paradigm for high-throughput analysis and quality assessment of food and traditional Chinese medicine active ingredients. • NIPPY library can effectively improve the prediction accuracy. • Optimal wavelength selection is essential for the model of prediction in SVR. • GA enables automatic screening of optimal characteristic wavelengths. • NIR combined with ML could be applied in real-time quality control for herbs.
Yao et al. (Sun,) studied this question.