What question did this study set out to answer?

This study aims to evaluate the effectiveness of the DINO framework for potato inspection using self-supervised learning in agricultural systems.

June 11, 2026Open Access

A comprehensive analytical assessment of dino-based architectures for sustainable automated potato inspection in the Al Kharj Region

Key Points

This study aims to evaluate the effectiveness of the DINO framework for potato inspection using self-supervised learning in agricultural systems.
Conducted a comprehensive review of the DINO framework focusing on architectural design and hyperparameters.
Evaluated the DINO-ViT model using a potato leaf disease dataset under rigorous experimental conditions.
Compared DINO-ViT's performance with traditional machine learning models like Random Forest and SVMs.
DINO-ViT achieved an accuracy of 0.8344 and an F1-score of 0.8334 in potato disease detection.
It learned robust feature representations with effective inter-class separability as shown by PCA and t-SNE visualizations.
DINO-ViT outperformed conventional machine learning approaches in accuracy and class-separation metrics.

Abstract

Abstract Self-supervised learning (SSL) has emerged as a pivotal paradigm in computer vision, particularly for applications where labeled data are limited, costly, or labor-intensive to obtain. This paradigm is especially relevant to sustainable agricultural systems, where large-scale manual annotation is impractical, and resource efficiency is a key objective. In agricultural vision tasks such as potato disease detection, quality assessment, and automated inspection, SSL has demonstrated strong potential to address the substantial visual variability encountered under real-world field conditions. Among recent SSL approaches, the Distillation with No Labels (DINO) framework, built upon Vision Transformers (ViTs) and multi-crop data augmentation, has shown remarkable capability in learning robust and transferable feature representations. In this study, we present a comprehensive analytical review of the DINO framework within the context of sustainable automated potato inspection systems in the Al Kharj region. The analysis systematically investigates architectural design choices, critical hyperparameters, teacher–student distillation dynamics, optimization behavior, and the mathematical mechanisms employed to prevent representation collapse. Using a potato leaf disease dataset, the DINO-ViT model is evaluated under rigorous experimental conditions. The results demonstrate that DINO-ViT effectively learns compact and semantically meaningful feature representations with clear inter-class separability, as evidenced by principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) visualizations, and quantitative class-separation metrics. Furthermore, when compared with conventional machine learning approaches, including Random Forest, Support Vector Machines, Gradient Boosting, and k-Nearest Neighbors, the DINO-ViT model achieves superior performance, attaining an accuracy of 0.8344 and an F1-score of 0.8334.

AI에게 질문

Bookmark

View Full Paper