What question did this study set out to answer?

The study aims to enhance breast cancer detection through a novel automated and interpretable CAD system using ensemble transformers.

February 22, 2026Open Access

An Interpretable Ensemble Transformer Framework for Breast Cancer Detection in Ultrasound Images

Puntos clave

The study aims to enhance breast cancer detection through a novel automated and interpretable CAD system using ensemble transformers.
Developed a computer-aided diagnosis system integrating ensemble transfer learning with Vision Transformer architectures.
Utilized Data-Efficient Image Transformer (Deit) and Vision Transformer (ViT) for feature extraction through concatenation-based fusion.
Employed preprocessing, normalization, and augmentation techniques for improved model robustness.
Applied Gradient-weighted Class Activation Mapping (Grad-CAM) for interpretability in clinical settings.
Benchmarking against multiple CNN and Transformer models using the Breast Ultrasound Images (BUSI) dataset.
Achieved 96.92% accuracy and 97.10% AUC for binary classification.
Obtained 94.27% accuracy and 94.81% AUC for three-class classification.
Showed strong generalizability with over 87% accuracy on independent datasets.
Performance decreased in fine-grained BI-RADS classification, highlighting complexity with accuracy as low as 68.75%.

Resumen

Background/Objectives: Early and accurate detection of breast cancer is essential for reducing mortality and improving patient outcomes. However, the manual interpretation of breast ultrasound images is challenging due to image variability, noise, and inter-observer subjectivity. This study aims to address these limitations by developing an automated and interpretable computer-aided diagnosis (CAD) system. Methods: We propose an automated and interpretable computer-aided diagnosis (CAD) system that integrates ensemble transfer learning with Vision Transformer architectures. The system combines the Data-Efficient Image Transformer (Deit) and Vision Transformer (ViT) through concatenation-based feature fusion to exploit their complementary representations. Preprocessing, normalization, and targeted data augmentation enhance robustness, while Gradient-weighted Class Activation Mapping (Grad-CAM) provides visual explanations to support clinical interpretability. The proposed model is benchmarked against state-of-the-art CNNs (VGG16, ResNet50, DenseNet201) and Transformer models (ViT, DeiT, Swin, Beit) using the Breast Ultrasound Images (BUSI) dataset. Results: The ensemble achieved 96. 92% accuracy and 97. 10% AUC for binary classification, and 94. 27% accuracy with 94. 81% AUC for three-class classification. External validation on independent datasets demonstrated strong generalizability, with 87. 76%/88. 07% accuracy/AUC on BrEaST, 86. 77%/85. 90% on BUS-BRA, and 86. 99%/86. 99% on BUSIWHU. Performance decreased for fine-grained BI-RADS classification—76. 68%/84. 59% accuracy/AUC on BUS-BRA and 68. 75%/81. 10% on BrEaST—reflecting the inherent complexity and subjectivity of clinical subclassification. Conclusions: The proposed Vision Transformer-based ensemble demonstrates high diagnostic accuracy, strong cross-dataset generalization, and clinically meaningful explainability. These findings highlight its potential as a reliable second-opinion CAD tool for breast cancer diagnosis, particularly in resource-limited clinical environments.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo