What question did this study set out to answer?

This research aims to develop a comprehensive AI-assisted framework for assessing green tea quality.

March 29, 2026Open Access

Long‐Tea‐CLIP: An Expert‐Level Multimodal AI Framework for Fine‐Grained Green Tea Grading Across Five Sensory Dimensions

Key Points

This research aims to develop a comprehensive AI-assisted framework for assessing green tea quality.
Developed Long-Tea-CLIP, integrates computer vision and chemoinformatics for tea grading.
Utilized separate submodels for evaluating appearance, soup color, aroma, infused leaf, and taste.
Implemented ResNet-18 for appearance grading using dry tea images and sensory comments.
Applied MLP and XGBoost for feature extraction and score integration from aroma and taste data.
Trained framework on 7763 image-text pairs from 38 Longjing tea varieties.
Achieved 92% accuracy in tea quality evaluation with Long-Tea-CLIP.
Demonstrated potential to improve tea quality control.
Enhanced market transparency through detailed assessment of the consumption experience.

Abstract

ABSTRACT Traditional tea quality evaluation depends on human evaluators, limiting scalability, and consistency. To establish an artificial intelligence (AI)‐assisted framework for comprehensive evaluation of tea quality and detailed assessment of the tea consumption experience, this study aims to develop Long‐Tea‐CLIP (Contrastive Language‐Image Pre‐training), a multimodal tea grading system that combines computer vision and chemoinformatics. It integrates five sensory evaluation dimensions for green tea, using separate submodels for appearance (ResNet‐18), soup color (eXtreme Gradient Boosting (XGBoost)), aroma, infused leaf, and taste (multilayer perceptron (MLP)). A deep network derived from ResNet‐18 integrates dry tea images with seven subdimensions of sensory comments to achieve a refined appearance “grading.” We apply Tip‐CLIP supervised MLP on feature data extraction from infused leaf and chemical data of aroma and taste to enhance accuracy. Submodel outputs are weighted into a unified framework to produce an overall score. Long‐Tea‐CLIP trained on 7763 image‐text pairs from 38 Longjing tea varieties achieves 92% accuracy, indicating its potential to enhance tea quality control and market transparency.

Long‐Tea‐CLIP: An Expert‐Level Multimodal AI Framework for Fine‐Grained Green Tea Grading Across Five Sensory Dimensions

Key Points

Abstract

Cite This Study