Abstract Background The accurate segmentation of lesions in breast ultrasound images directly affects the assessment of lesion size, location, and morphological characteristics, and plays a crucial role in clinical diagnosis and treatment. However, due to multiple inherent difficulties such as noise interference, low contrast between the lesion and the surrounding tissues, and the complexity and diversity of the lesion, this task remains highly challenging. Purpose To address the difficulties, we aimed to develop an efficient segmentation model that enhanced feature representation by integrating multi‐scale contextual information, and improved lesion boundary detection through adaptive attention mechanisms. Specifically, we proposed the enhanced multi‐scale selective attention (EMSA) U‐Net, which improves the segmentation accuracy of breast ultrasound images. Methods The training was conducted using Breast Ultrasound Images (BUSI) dataset and Breast Ultrasound Images from Wuhan University (BUSIWHU) dataset. BUSIWHU dataset was randomly divided into a training set (n = 741) and a test set (n = 186), and the labels were classified as normal tissues, benign tumors, and malignant tumors. BUSI dataset was randomly divided into a training set (n = 452), a validation set (n = 110), and a test set (n = 195), and it covered cases of different age groups, providing a rich sample of both benign and malignant tumors. For the above‐mentioned breast ultrasound images, the benign and malignant types of injuries were combined into one category for segmentation. All images were preprocessed to uniformly scale to a resolution of 480 480 pixels and underwent pixel value normalization. During training, data augmentation strategies involving random horizontal flipping and scaling were employed to enhance the model's generalization capability. The images were then resized in the dataset and batched as input them into the model for training. The proposed model combined an EMSA block. It used multi‐branch dilated convolution with different kernel sizes to dynamically adjust the receptive field, enabling the model to capture fine‐grained details and global contextual features. Moreover, integrated gating mechanisms to balance the importance of multi‐scale features could suppress irrelevant background information and emphasize disease‐specific patterns. In addition, the spatial kernel convolution (SKConv) module was introduced to selectively fuse spatial and semantic features. SKConv adaptively adjusted the convolution kernels based on the input features, enhancing the model's ability to distinguish lesion boundaries from noisy backgrounds. Model performance was evaluated primarily using the average Dice similarity coefficient (mDice) and average intersection‐over‐union ratio (mIoU). Statistical significance was verified through paired t‐tests (significance level) and corrected using Bonferroni multiple testing. Results On the BUSI test set, our model achieved an mDice score of 68. 33%. After Bonferroni correction, this represents a statistically significant improvement over both TransUNet (56. 80%, , Cohen's) and U‐Net (49. 21%, , Cohen's). Compared with CSA‐UNet (62. 41%), our model attained 68. 33%, showing a large effect size (, Cohen's), although the difference did not reach the corrected significance threshold. Conclusions The proposed model effectively enables adaptive multiscale feature learning and context‐aware attention, leading to superior segmentation performance in breast ultrasound images.
Wang et al. (Thu,) studied this question.