ABSTRACT Traditional tea quality evaluation depends on human evaluators, limiting scalability, and consistency. To establish an artificial intelligence (AI)‐assisted framework for comprehensive evaluation of tea quality and detailed assessment of the tea consumption experience, this study aims to develop Long‐Tea‐CLIP (Contrastive Language‐Image Pre‐training), a multimodal tea grading system that combines computer vision and chemoinformatics. It integrates five sensory evaluation dimensions for green tea, using separate submodels for appearance (ResNet‐18), soup color (eXtreme Gradient Boosting (XGBoost)), aroma, infused leaf, and taste (multilayer perceptron (MLP)). A deep network derived from ResNet‐18 integrates dry tea images with seven subdimensions of sensory comments to achieve a refined appearance “grading.” We apply Tip‐CLIP supervised MLP on feature data extraction from infused leaf and chemical data of aroma and taste to enhance accuracy. Submodel outputs are weighted into a unified framework to produce an overall score. Long‐Tea‐CLIP trained on 7763 image‐text pairs from 38 Longjing tea varieties achieves 92% accuracy, indicating its potential to enhance tea quality control and market transparency.
Xu et al. (Fri,) studied this question.