What question did this study set out to answer?

To evaluate the effectiveness of an AI algorithm designed for screening mammography when used for women with breast symptoms.

March 27, 2026Open Access

Performance of a Screening Mammography AI Algorithm Repurposed for Symptomatic Mammography in a Tertiary Outpatient Clinic

Key Points

To evaluate the effectiveness of an AI algorithm designed for screening mammography when used for women with breast symptoms.
Conducted a retrospective study at a single center involving symptomatic women who underwent digital mammography.
Applied an FDA-cleared AI algorithm to generate malignancy scores from mammograms.
Used histopathology and follow-up data as reference standards to assess diagnostic performance.
Performed receiver operating characteristic analysis to evaluate accuracy across breast density subgroups.
The AI algorithm achieved an area under the curve (AUC) of 0.96 for the overall cohort.
High accuracy was maintained in both non-dense (AUC = 0.96) and dense breasts (AUC = 0.99).
No significant performance difference was found between breast density groups (p = 0.36).
Decision curve analysis indicated a positive net benefit for AI scores across varying probabilities.

Abstract

Background/Objectives: The aim of the study was to evaluate the diagnostic accuracy of a commercial artificial intelligence (AI) algorithm originally developed for screening mammography when applied to symptomatic women presenting to a tertiary outpatient clinic. Methods: This single-center, retrospective diagnostic accuracy study included women who presented with breast symptoms to a tertiary outpatient clinic between January and June 2013 and underwent digital mammography. An AI algorithm cleared by the U.S. Food and Drug Administration (FDA)-cleared AI algorithm was applied to all mammograms and generated continuous malignancy scores ranging from 1 to 100. Mammographic breast density was classified according to the American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) by two experienced radiologists. Histopathology, when available, or otherwise a minimum of 2 years of clinical and imaging follow-up served as the reference standard. Diagnostic performance was assessed using receiver operating characteristic (ROC) analysis with calculation of the area under the curve (AUC) and 95% confidence intervals (CI) derived by patient level bootstrap resampling (n = 2000). Analyses were performed for the overall cohort and stratified by breast density (non-dense BI-RADS A–B vs. dense BI-RADS C–D). Results: A total of 78 women (mean age, 55 ± 11 years) were included, of whom 16 had histopathological verification of suspicious lesions with proven breast cancer in 14 patients and 62 were classified based on follow-up alone. In the overall cohort (156 breasts, including 15 breasts with malignancies), the AI algorithm achieved an AUC of 0.96 (95% CI: 0.86–1.00). Performance remained high in non-dense breasts (AUC = 0.96; 95% CI: 0.88–1.00) and dense breasts (AUC = 0.99; 95% CI: 0.93–1.00), with no statistically significant difference observed between density subgroups (DeLong test, p = 0.36), although subgroup comparisons were underpowered. Decision curve analysis suggested a consistent positive net benefit across a wide range of threshold probabilities in both density groups. Conclusions: In this preliminary, single-center retrospective cohort, a screening-trained AI algorithm showed promising diagnostic accuracy when applied to symptomatic mammograms. These findings require validation in larger, contemporary, multicenter cohorts before clinical implementation.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper