Abstract Background The use of artificial intelligence (AI) in many aspects of society is expanding rapidly. AI has recently been employed in the National Health Service skin cancer referral pathway. We sought to assess the real-world diagnostic performance of AI in this setting. Objectives To assess the real-world diagnostic performance, at the histopathological level, of AI employed in the urgent skin cancer screening pathway. Methods This was a prospective observational study of the first 3 months of skin cancer referrals assessed by the Deep Ensemble for Recognition of Malignancy (DERM) AI algorithm in a tertiary care dermatology department in the North West of England. All lesions assessed by the algorithm were included in the analysis. Participant data were retrieved from medical records. Outcomes assessed included the AI diagnosis, whether a human review of AI diagnosis occurred, the face-to-face dermatologist’s diagnosis and the outcome of the dermatologist’s assessment. Comparison was made particularly between the final histopathological diagnosis, and AI and dermatologist diagnoses. Results AI had a sensitivity of 95.3% 95% confidence interval (CI) 90.5–98.1, which compared favourably with dermatologists (88.5%, 95% CI 82.3–93.2; P = 0.006). The positive predictive value of the AI algorithm was lower, at 46.5% (95% CI 44.6–48.5). This compared with 62.1% (95% CI 53.9–67.7) in dermatologists. A total of 318 AI assessments with no remote human review went on to have their lesions reviewed by a dermatologist and biopsied. AI correctly identified the precise diagnosis 28.6% of the time, compared with dermatologists 61.6% of the time (P 0.001). The correct tumour/lesion type was identified by AI 51.4% of the time and by dermatologists 75.5% (P 0.001). In lesions that the AI deemed benign, and that would have been discharged with no human review, four cancers were diagnosed. Conclusions AI has high sensitivity in the detection of skin cancer. However, the diagnostic accuracy of the information provided by AI to clinicians is low and could be further optimized to reduce the risk of automation bias. Furthermore, this study suggests the removal of human validation of AI decisions may be premature due to the potential for missed cancer diagnoses.
Earnshaw et al. (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: