What question did this study set out to answer?

The research aims to improve skin lesion classification using a novel deep learning framework that enhances interpretability.

March 29, 2026Open Access

DermaScanAI an explainable hybrid deep learning framework for automated skin lesion classification using dual attention and metadata fusion

Key Points

The research aims to improve skin lesion classification using a novel deep learning framework that enhances interpretability.
Developed a hybrid framework combining multi-scale convolutional feature extraction with lightweight transformer encoders.
Incorporated squeeze-and-excitation blocks for improved channel-wise attention.
Utilized Grad-CAM for post-hoc interpretability.
Evaluated model performance on the HAM10000 dataset across seven skin lesion categories.
Achieved 94.8% overall accuracy in skin lesion classification.
Obtained a macro-average F1-score of 91.9%.
Reached an AUC of 0.957, indicating strong classification ability.
Demonstrated robustness across various lesion categories through class-wise performance analysis.

Abstract

Skin cancer is more common and can be fatal if not diagnosed and treated promptly. Automated skin lesion classification based on dermoscopic images has attracted significant attention, especially with the rapid rise of deep learning-based methods. Yet, it still suffers from limitations, including insufficient multi-scale representation, a lack of global context modelling, and less effective feature recalibration, which undermine classification reliability and robustness. Furthermore, the lack of transparency associated with many deep learning models erodes trust and acceptance among healthcare providers. To tackle these issues, we propose a new deep learning framework that combines multi-scale convolutional feature extraction with lightweight Transformer encoders, while incorporating squeeze-and-excitation (SE) blocks to improve channel-wise attention. By exploiting spatial granularity, contextual richness, and adaptive feature recalibration across latent class-specific patterns in dermoscopic images, the proposed model classifies images from the HAM10000 dataset into seven skin lesion categories. It uses a multi-stage architecture to embed both local and global patterns, and Grad-CAM as a post-hoc explainability method to promote interpretability. We conduct experimental evaluations on the publicly available HAM10000 dataset, and the results indicate that the proposed strategy achieves competitive performance with the state-of-the-art methods, reaching 94.8% overall accuracy, 91.9% macro-average F1-score, and 0.957 AUC. Class-wise performance analysis and ROC curves demonstrate robustness across a range of lesion categories, while ablation experiments verify the individual contributions of each architectural component. In conclusion, our framework outlines a possible computational approach that combines interpretation with noise resilience to refine automated classification of skin lesions. However, further prospective, multi-institutional validation will be necessary before it can have implications for future clinical decision support systems.

AI에게 질문

Bookmark

View Full Paper