Background/Objectives: The aim of the study was to evaluate the diagnostic accuracy of a commercial artificial intelligence (AI) algorithm originally developed for screening mammography when applied to symptomatic women presenting to a tertiary outpatient clinic. Methods: This single-center, retrospective diagnostic accuracy study included women who presented with breast symptoms to a tertiary outpatient clinic between January and June 2013 and underwent digital mammography. An AI algorithm cleared by the U.S. Food and Drug Administration (FDA)-cleared AI algorithm was applied to all mammograms and generated continuous malignancy scores ranging from 1 to 100. Mammographic breast density was classified according to the American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) by two experienced radiologists. Histopathology, when available, or otherwise a minimum of 2 years of clinical and imaging follow-up served as the reference standard. Diagnostic performance was assessed using receiver operating characteristic (ROC) analysis with calculation of the area under the curve (AUC) and 95% confidence intervals (CI) derived by patient level bootstrap resampling (n = 2000). Analyses were performed for the overall cohort and stratified by breast density (non-dense BI-RADS A–B vs. dense BI-RADS C–D). Results: A total of 78 women (mean age, 55 ± 11 years) were included, of whom 16 had histopathological verification of suspicious lesions with proven breast cancer in 14 patients and 62 were classified based on follow-up alone. In the overall cohort (156 breasts, including 15 breasts with malignancies), the AI algorithm achieved an AUC of 0.96 (95% CI: 0.86–1.00). Performance remained high in non-dense breasts (AUC = 0.96; 95% CI: 0.88–1.00) and dense breasts (AUC = 0.99; 95% CI: 0.93–1.00), with no statistically significant difference observed between density subgroups (DeLong test, p = 0.36), although subgroup comparisons were underpowered. Decision curve analysis suggested a consistent positive net benefit across a wide range of threshold probabilities in both density groups. Conclusions: In this preliminary, single-center retrospective cohort, a screening-trained AI algorithm showed promising diagnostic accuracy when applied to symptomatic mammograms. These findings require validation in larger, contemporary, multicenter cohorts before clinical implementation.
Ngo et al. (Wed,) studied this question.