March 3, 2026Open Access

Benchmarking State-of-the-Art Vision Transformer Architectures for the Automated Classification of Pigmented Skin Lesions

Key Points

DeiT III-Base achieved superior accuracy of 92.04% in classifying pigmented skin lesions, enhancing diagnostic efficacy.
Using the HAM10000 dataset, images were normalized and resized to 224×224 pixels for optimal model training.
Evaluation highlighted DeiT III-Base and ViT-Base with sub-millisecond inference times, crucial for clinical application.
These findings support attention-based models as effective tools for real-time computer-aided diagnostics, improving early detection.

Abstract

Skin cancer represents an escalating global public health challenge where early detection is paramount, potentially increasing five-year survival rates to 99%. While dermoscopy improves diagnostic sensitivity, its effectiveness often depends on clinician experience and is subject to inter-observer variability. To address these limitations, this study presents a rigorous comparative analysis of four state-of-the-art Vision Transformer (ViT) architectures, DeiT III-Base, Swin-Base, ViT-Base, and PiT-B, for the automated classification of pigmented skin lesions. We utilized the HAM10000 dataset (n=10,011) and implemented a stratified 70-15-15 split to ensure balanced training, validation, and testing phases. Images were resized to 224×224 pixels and normalized using ImageNet parameters, while transfer learning was employed to stabilize training and enhance generalization. Experimental results indicate that DeiT III-Base achieved superior diagnostic efficacy, reaching an accuracy of 92.04% and an F1-score of 85.44%. Furthermore, computational evaluation revealed that DeiT III-Base and ViT-Base offered highly efficient clinical throughput with sub-millisecond inference times (0.5674 ms and 0.5459 ms, respectively), whereas PiT-B exhibited the lowest computational workload (21.1067 GFLOPs). These findings underscore the viability of attention-based paradigms as robust real-time Computer-Aided Diagnosis (CAD) tools. Future research will explore the integration of multi-modal patient data and Explainable AI (XAI) to foster transparency and clinical trust.

Bookmark

View Full Paper

Cite This Study

Islam et al. (Wed,) studied this question.

synapsesocial.com/papers/69a75bdbc6e9836116a23ed5 https://doi.org/https://doi.org/10.69882/adba.cem.2026015

Bookmark

View Full Paper