Background Several new mobile applications (apps) have been developed that utilize artificial intelligence (AI) to diagnose skin lesions. Objective The goal of this study was to evaluate the diagnostic accuracy of the most popular smartphone apps using a database of skin lesion images with diverse skin tones. An additional goal was to measure the apps' sensitivity and specificity in detecting skin cancer. Methods A thorough search was performed in the Google Play Store and Apple App Store to find the most popular skin apps that diagnose skin lesions. We used the Stanford Diverse Dermatology Images database (DDI) to test the accuracy of the following apps: ChatGPT (OpenAI, San Francisco, CA, USA), AI skin scanner Rash Detector (by I Lov Guitars Inc., Scarborough, ON), Rash ID (Appsmiths LLC, Canton, MS USA), and Skin Scanner Dermatology & Acne (ACINA, UAB, located at Krokuvos, Vilnius, Lithuania). One hundred and two images with a range of diagnoses were selected for upload to each app. Fifty-one images were malignant, and 51 were benign. We also trained a new model of ChatGPT using a separate set of 554 images from the same database. Results All the apps had low diagnostic accuracy. The overall accuracy was 22%. When classifying benign versus malignant diagnoses, the apps had an average sensitivity of 46.57% and an average specificity of 72.06%. The average positive predictive value was 67.44%, and the average negative predictive value was 58.06%. In our study, training ChatGPT did not improve its diagnostic accuracy. Conclusions ChatGPT, Rash Detector, Rash ID, and Skin Scanner Dermatology & Acne performed poorly at diagnosing skin lesions from a database with diverse skin tones. These apps should not be used as stand-alone diagnostic tools.
Shah et al. (Mon,) studied this question.