What question did this study set out to answer?

To evaluate the real-world diagnostic performance of an AI algorithm in skin cancer screening at the histopathological level.

June 11, 2026Open Access

Real-world diagnostic performance of artificial intelligence in skin cancer screening

Key Points

To evaluate the real-world diagnostic performance of an AI algorithm in skin cancer screening at the histopathological level.
Prospective observational study of AI assessments in skin cancer referrals over 3 months
AI algorithm used: Deep Ensemble for Recognition of Malignancy (DERM)
Comparison of AI and dermatologist diagnoses against histopathological outcomes.
AI sensitivity was 95.3% (95% CI 90.5–98.1), higher than dermatologists at 88.5% (P = 0.006)
AI's positive predictive value was 46.5% (95% CI 44.6–48.5), vs dermatologists' 62.1% (95% CI 53.9–67.7)
AI correctly identified the precise diagnosis 28.6% of the time, while dermatologists did so 61.6% (P < 0.001).

Abstract

Abstract Background The use of artificial intelligence (AI) in many aspects of society is expanding rapidly. AI has recently been employed in the National Health Service skin cancer referral pathway. We sought to assess the real-world diagnostic performance of AI in this setting. Objectives To assess the real-world diagnostic performance, at the histopathological level, of AI employed in the urgent skin cancer screening pathway. Methods This was a prospective observational study of the first 3 months of skin cancer referrals assessed by the Deep Ensemble for Recognition of Malignancy (DERM) AI algorithm in a tertiary care dermatology department in the North West of England. All lesions assessed by the algorithm were included in the analysis. Participant data were retrieved from medical records. Outcomes assessed included the AI diagnosis, whether a human review of AI diagnosis occurred, the face-to-face dermatologist’s diagnosis and the outcome of the dermatologist’s assessment. Comparison was made particularly between the final histopathological diagnosis, and AI and dermatologist diagnoses. Results AI had a sensitivity of 95.3% 95% confidence interval (CI) 90.5–98.1, which compared favourably with dermatologists (88.5%, 95% CI 82.3–93.2; P = 0.006). The positive predictive value of the AI algorithm was lower, at 46.5% (95% CI 44.6–48.5). This compared with 62.1% (95% CI 53.9–67.7) in dermatologists. A total of 318 AI assessments with no remote human review went on to have their lesions reviewed by a dermatologist and biopsied. AI correctly identified the precise diagnosis 28.6% of the time, compared with dermatologists 61.6% of the time (P 0.001). The correct tumour/lesion type was identified by AI 51.4% of the time and by dermatologists 75.5% (P 0.001). In lesions that the AI deemed benign, and that would have been discharged with no human review, four cancers were diagnosed. Conclusions AI has high sensitivity in the detection of skin cancer. However, the diagnostic accuracy of the information provided by AI to clinicians is low and could be further optimized to reduce the risk of automation bias. Furthermore, this study suggests the removal of human validation of AI decisions may be premature due to the potential for missed cancer diagnoses.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper

Cite This Study

Earnshaw et al. (Sat,) studied this question.

synapsesocial.com/papers/6a2a523480c8f91e7f39e4e6 https://doi.org/https://doi.org/10.1093/skinhd/vzag068

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper