Key points are not available for this paper at this time.
Abstract Background Our aim is to demonstrate that multimodal deep learning can enhance the accuracy of classifying skin lesions using both images and textual descriptions (e.g., demographics, clinical information) compared to a classifier that learn only on images. Methods We used the HAM10000 and ISIC 2017 datasets in our study containing 10,000 and 2,750 skin lesion images, respectively. We combined the images with patients’ data (e.g., sex, age, lesion location) for training and evaluating a multimodal deep learning classification model. The dataset was split into 70% for training the model, 20% for the validation set, and 10% for the testing set. We compared the multimodal model’s performance to well-known deep learning models that only use images for classification. Results We used accuracy and area under the curve (AUC) receiver operating characteristic (ROC) as the metrics to compare the models’ performance. Our multimodal model outperformed the competitors and achieved the best results. Our model’s accuracy and AUCROC was 0.9411 and 0.9426, respectively, on HAM10000. On ISIC 2017, our model’s accuracy and AUCROC was 0.7971 and 0.8253, respectively. Conclusion Our study showed that a multimodal deep learning model can outperform traditional deep learning models for skin lesion classification on the HAM10000 and ISIC 2017 datasets. Our approach can enable primary care clinicians to screen for skin cancer in patients (residing in areas lacking access to expert dermatologists) with higher accuracy and reliability.
Adebiyi et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: