Worldwide, breast cancer is the leading cause of death in women. This emphasizes the significance of an accurate breast cancer detection system. This study presents a unified framework for segmentation of breast cancer using multimodal imaging, such as histopathology, MRI, mammogram, and ultrasound. This framework integrates the CNN with Transformer modules and has three core technical innovations. First, features are extracted using an encoder–decoder design. The encoder has Residual Blocks with a base channel of 32, following feature extraction, which are progressively mapped and downsampled into four stages (32 → 64 → 128 → 256) of channels. The spatial channel is reduced using MaxPool2d operations from 256 × 256 to 128 × 128, 64 × 64, 32 × 32, and 16 × 16. After further convolutional refinement, a Transformer encoder is used on the 16 × 16 feature maps in the bottleneck. The Transformer comprises four encoders with multi-head self-attention (eight heads) and a 4.0 MLP ratio, enabling the model to capture local and global contextual dependencies at the lowest resolution. The proposed framework is trained with a learning rate of 1 × 10−4, up to 50 epochs with early stopping (patience = 12), using a combined Dice and binary cross-entropy loss that balances pixel-wise accuracy and overlap-based learning. Gradient clipping with a maximum norm of 5.0 is used to ensure training stability; ReduceLROnPlateau (factor = 0.5, patience = 5) is used to dynamically adjust the learning rate; and early stopping is used to prevent overfitting. To improve generalization and enhance robustness to data variability, data augmentation techniques such as random horizontal and vertical flips, intensity variations, and small rotations (±15°) are applied. Incremental learning was implemented in this study as a warm-start fine-tuning strategy, where the model was initialized based on learned weights from a previously trained model instead of training from scratch. This is done by loading saved checkpoints of the best-performing model and continuing training on a new dataset. The performance of the proposed framework is evaluated on four publicly available datasets and one local dataset, such as BUS-UCLM, BUSI, BreastDM, TNBC NucleiSegmentation, and BCSD-2024. The impressive results are achieved with Dice scores of 0.974 on ULCM, 0.975 on BUSI, 0.971 on BreastDM, 0.904 on TNBC nuclei segmentation, and 0.982 on BCSD-2024. The proposed model consistently performed better than classical U-Net models.
Manzoor et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: