In recent years, deep learning has achieved remarkable advancements in medical image analysis, particularly through Convolutional Neural Networks (CNNs) and Transformer-based architectures. This study aims to evaluate and compare the performance of five transfer learning models (DenseNet169, InceptionV3, MobileNetV2, VGG16 and Xception) and a Vision Transformer (ViT) model for the classification of skin cancer using the “Skin Cancer: Malignant vs. Benign” dataset .In the first phase, the ViT model achieved the highest overall performance with 93.79% recall, 92.22% precision, 93.00% F1-score and 92.42% accuracy. Although InceptionV3 and MobileNetV2 demonstrated strong recall values, they did not match the overall accuracy of ViT. In the second phase, image enhancement techniques—grayscale conversion, thresholding, Canny edge detection, dilation, and erosion were applied to emphasize lesion boundaries and improve contrast. Using these enhanced images, the ViT model again achieved the best performance, with 95.49% recall, 94.17% precision, 94.83% F1-score, and 94.39% accuracy. These results indicate that the ViT architecture provides superior accuracy and reliability in complex and enhanced medical images. Furthermore, the study demonstrates that incorporating image preprocessing techniques can significantly enhance the performance of deep learning models in medical imaging applications.
Yasin Özkan (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: