The fermentation process of oolong tea is a critical step in shaping its quality and flavor profile. In this study, the fermentation degree of Anxi Tieguanyin oolong tea was assessed using image and hyperspectral features. Machine learning algorithms, including Support Vector Machine (SVM), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU), were employed to develop models based on both single-source features and multi-source fused features. First, color and texture features were extracted from RGB images and then processed through Pearson correlation-based feature selection and Principal Component Analysis (PCA) for dimensionality reduction. For the hyperspectral data, preprocessing was conducted using Normalization (Nor) and Standard Normal Variate (SNV), followed by feature selection and dimensionality reduction with Competitive Adaptive Reweighted Sampling (CARS), Successive Projections Algorithm (SPA), and PCA. We then performed mid-level fusion on the two feature sets and selected the most relevant features using L1 regularization for the final modeling stage. Finally, SHapley Additive exPlanations (SHAP) analysis was conducted on the optimal models to reveal key features from both hyperspectral bands and image data. The results indicated that models based on single features achieved test set accuracies of 68.06% to 87.50%, while models based on data fusion achieved 77.78% to 94.44%. Specifically, the Pearson+Nor-SPA+L1+SVM fusion model achieved the highest accuracy of 94.44%. This demonstrates that data feature fusion enables a more comprehensive characterization of the fermentation process, significantly improving model accuracy. SHAP analysis revealed that the hyperspectral bands at 967, 942, 814, 784, 781, 503, 413, and 416 nm, along with the image features Hσ and H, played the most crucial roles in distinguishing tea fermentation stages. These findings provide a scientific basis for assessing the fermentation degree of Tieguanyin oolong tea and support the development of intelligent detection systems.
Huang et al. (Mon,) studied this question.