What question did this study set out to answer?

To develop a unified framework for accurate breast cancer image segmentation using multimodal imaging.

May 22, 2026Open Access

Sequential Transfer Learning for Multi-Domain Breast Image Segmentation Using a Transformer-Enhanced Hybrid U-Net

Key Points

To develop a unified framework for accurate breast cancer image segmentation using multimodal imaging.
The framework integrates CNN and Transformer modules for feature extraction.
Incremental learning is applied via warm-start fine-tuning using previously trained weights.
Performance evaluated on four public datasets and one local dataset, with data augmentation techniques implemented.
Achieved Dice scores of 0.974 on ULCM, 0.975 on BUSI, 0.971 on BreastDM, 0.904 on TNBC nuclei segmentation, and 0.982 on BCSD-2024.
Outperformed classical U-Net models across all datasets.

Abstract

Worldwide, breast cancer is the leading cause of death in women. This emphasizes the significance of an accurate breast cancer detection system. This study presents a unified framework for segmentation of breast cancer using multimodal imaging, such as histopathology, MRI, mammogram, and ultrasound. This framework integrates the CNN with Transformer modules and has three core technical innovations. First, features are extracted using an encoder–decoder design. The encoder has Residual Blocks with a base channel of 32, following feature extraction, which are progressively mapped and downsampled into four stages (32 → 64 → 128 → 256) of channels. The spatial channel is reduced using MaxPool2d operations from 256 × 256 to 128 × 128, 64 × 64, 32 × 32, and 16 × 16. After further convolutional refinement, a Transformer encoder is used on the 16 × 16 feature maps in the bottleneck. The Transformer comprises four encoders with multi-head self-attention (eight heads) and a 4.0 MLP ratio, enabling the model to capture local and global contextual dependencies at the lowest resolution. The proposed framework is trained with a learning rate of 1 × 10−4, up to 50 epochs with early stopping (patience = 12), using a combined Dice and binary cross-entropy loss that balances pixel-wise accuracy and overlap-based learning. Gradient clipping with a maximum norm of 5.0 is used to ensure training stability; ReduceLROnPlateau (factor = 0.5, patience = 5) is used to dynamically adjust the learning rate; and early stopping is used to prevent overfitting. To improve generalization and enhance robustness to data variability, data augmentation techniques such as random horizontal and vertical flips, intensity variations, and small rotations (±15°) are applied. Incremental learning was implemented in this study as a warm-start fine-tuning strategy, where the model was initialized based on learned weights from a previously trained model instead of training from scratch. This is done by loading saved checkpoints of the best-performing model and continuing training on a new dataset. The performance of the proposed framework is evaluated on four publicly available datasets and one local dataset, such as BUS-UCLM, BUSI, BreastDM, TNBC NucleiSegmentation, and BCSD-2024. The impressive results are achieved with Dice scores of 0.974 on ULCM, 0.975 on BUSI, 0.971 on BreastDM, 0.904 on TNBC nuclei segmentation, and 0.982 on BCSD-2024. The proposed model consistently performed better than classical U-Net models.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Manzoor et al. (Mon,) studied this question.

synapsesocial.com/papers/6a0ff3d9d674f7c03778cc4e https://doi.org/https://doi.org/10.3390/bioengineering13050570

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper