What type of study is this?

This is a Systematic Review study.

What question did this study set out to answer?

The review aims to evaluate the application of transfer learning in breast cancer classification using mammography by analyzing study characteristics and methodologies.

May 7, 2026Open Access

Deep Transfer Learning for Breast Cancer Classification in Mammography: A Systematic Review

Key Points

The review aims to evaluate the application of transfer learning in breast cancer classification using mammography by analyzing study characteristics and methodologies.
Conducted a systematic review following PRISMA guidelines
Synthesized 154 studies using pretrained convolutional neural networks
Evaluated reproducibility, code availability, and risk-of-bias factors
Analyzed external validation and patient-level data splitting
High diagnostic performance reported, but variability in datasets and methodologies observed
Limited external validation and reproducibility found across studies
Identified need for robust validation and transparent reporting across diverse datasets

Abstract

• Systematic review of transfer learning for mammography (2020–2025) • PRISMA-based synthesis of 154 studies on pretrained CNNs • Evaluates reproducibility, code availability, and bias risks • Identifies gaps in external validation and patient-level splitting • Future directions: multimodal fusion, federated learning, explainable AI Deep transfer learning has been widely applied to mammography-based breast cancer classification, with many studies reporting high diagnostic performance. However, substantial variability in datasets, validation strategies, and reporting practices complicates interpretation and clinical relevance. A systematic review was conducted following PRISMA guidelines to identify studies published between 2020 and 2025 that applied pretrained convolutional neural networks to mammographic breast cancer classification. Study characteristics, datasets, architectures, validation strategies, performance metrics, reproducibility indicators, and risk-of-bias factors were extracted and synthesized using a structured narrative approach. A total of 154 studies were included. While many report high benchmark performance, these findings often arise under limited validation conditions and must be interpreted cautiously. External validation, patient-level data splitting, and transparent reporting of code and training configurations were uncommon. Comparative synthesis revealed that reported performance was strongly influenced by dataset characteristics and validation design, with more methodologically rigorous studies generally reporting moderate but potentially more reliable results. Deep transfer learning approaches show promise for mammographic breast cancer classification, but the current literature is characterized by substantial methodological heterogeneity, limited reproducibility, and risks of bias. These findings highlight a persistent gap between benchmark performance and robust clinical applicability, underscoring the need for rigorous validation, transparent reporting, and evaluation on diverse contemporary datasets.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Oyekanmi et al. (Fri,) studied this question.

synapsesocial.com/papers/69fbf004164b5133a91a42eb https://doi.org/https://doi.org/10.1016/j.cmpbup.2026.100251

Bookmark

View Full Paper