What does this research mean for the field?

Google's mammography AI system improves sensitivity for breast cancer detection compared to first readers, achieving a cancer detection rate increase from 7.54 to 9.33 per 1,000 women. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.SUPPORTS_CONSENSUS.

What question did this study set out to answer?

The aim was to evaluate the diagnostic accuracy and clinical implementation of an AI system for breast cancer screening.

March 13, 2026Open Access

Diagnostic accuracy, fairness and clinical implementation of AI for breast cancer screening: results of multicenter retrospective and prospective technical feasibility studies

Key Points

The aim was to evaluate the diagnostic accuracy and clinical implementation of an AI system for breast cancer screening.
Retrospective evaluation of 115,973 mammograms from five National Health Service services
Prospective deployment across 12 sites with 9,266 cases
Comparison of AI performance against first and second readers
Analysis of breast-level data and cancer detection rates
AI showed superior sensitivity compared to the first reader (0.541 vs. 0.437, P < 0.001)
Noninferior specificity was observed (0.943 vs. 0.952, P < 0.001)
Cancer detection rate rose from 7.54 to 9.33 per 1,000 women, with AI detecting 25% of interval cancers
First screens exhibited 39.3% fewer recalls and 8.8% higher detection rates
Prospective deployment indicated a need for threshold recalibration.

Abstract

Artificial intelligence (AI) promises to enhance breast cancer screening. Here we evaluated Google's mammography AI system (version 1.2) across two phases: a retrospective study using 115,973 mammograms from five National Health Service screening services with 39-month follow-up and prospective noninterventional feasibility deployment at 12 sites (9,266 cases). The primary endpoint was AI sensitivity and specificity versus first reader using a 5% noninferiority margin. The secondary endpoints were performance versus second or consensus readers and breast-level analyses. Retrospectively, AI achieved superior sensitivity (0.541 versus 0.437 for first reader, P < 0.001) and noninferior specificity (0.943 versus 0.952, P < 0.001). Cancer detection rate increased from 7.54 to 9.33 per 1,000 women, with AI detecting 25.0% of interval cancers. Performance was particularly strong for first screens (39.3% fewer recalls, 8.8% higher detection) and invasive cancers. No systematic demographic disparities were observed. Simulated second-reader replacement reduced reading time by 32% while increasing detection by 17.7%. Prospective deployment confirmed technical feasibility but revealed a distribution shift requiring threshold recalibration. Implementation requires adaptive calibration and continuous monitoring to ensure safety and equity.

Ask AI

Helpful

Bookmark

View Full Paper