Rice (Oryza sativa L.) is a staple food for over half of the global population, with significant economic, agricultural, and cultural importance, particularly in Asia. Thousands of rice varieties exist worldwide, differing in size, shape, color, and texture, making accurate classification essential for quality control, breeding programs, and authenticity verification in trade and research. Traditional manual identification of rice varieties is time-consuming, error-prone, and heavily reliant on expert knowledge. Deep learning provides an efficient alternative by automatically extracting discriminative features from rice grain images for precise classification. While prior studies have primarily employed deep learning models such as CNN, VGG, InceptionV3, MobileNet, and DenseNet201, transformer-based models remain underexplored for rice variety classification. This study addresses this gap by applying two deep learning models such as Swin Transformer and Vision Transformer for multi-class classification of rice varieties using the publicly available PRBD dataset from Bangladesh. Experimental results demonstrate that the ViT model achieved an accuracy of 99.86% with precision, recall, and F1-score all at 0.9986, while the Swin Transformer model obtained an accuracy of 99.44% with a precision of 0.9944, recall of 0.9944, and F1-score of 0.9943. These results highlight the effectiveness of transformer-based models for high-accuracy rice variety classification.
Tabassum et al. (Tue,) studied this question.