To address the inefficiency and subjectivity of manual grading, this study established a machine learning model based on near-infrared hyperspectral data (950–1650 nm) for the accurate classification of first-roasted tobacco grades. Multivariate statistical analysis uncovered the intrinsic correlations among grade, spectral data, and chemical composition, thereby laying a theoretical foundation for hyperspectral-based grading technology. Three preprocessing methods (namely, multiplicative scatter correction (MSC), standard normal variate transformation, and Savitzky–Golay convolutional smoothing) and four classification models (namely, random forest, backpropagation neural network, extreme learning machine, and partial least squares–discriminant analysis (PLS-DA)) were employed. Moreover, characteristic bands were selected through the successive projections algorithm (SPA) and competitive adaptive reweighted sampling to investigate how the number of characteristic bands affects the grade classification accuracy. The results showed that rank exhibited highly significant correlations with nicotine, reducing sugars, total sugars, and sugar-nicotine ratio, and that spectra exhibited highly significant correlations with nicotine. The classification accuracy of full-band MSC preprocessing combined with the PLS-DA model reached 98.5%, while the classification accuracy reached 94.0% when using 70% of the full bands selected using the SPA. In conclusion, near-infrared hyperspectroscopy combined with machine learning not only offers high efficiency, accuracy, and non-destructiveness in the grading of first-roasted tobacco leaves but also provides a theoretical basis for industrial hyperspectral grading by elucidating the correlations among spectrum, chemical composition, and grade. This method avoids the subjectivity of manual grading and offers key technical support to advance the intelligence and automation of first-roasted tobacco leaf grading in the tobacco industry.
Zou et al. (Tue,) studied this question.