ABSTRACT To address the challenges of edge loss and low segmentation accuracy in small regions in medical image segmentation, this study proposes a novel segmentation network, MSFIF‐Net, which integrates the convolutional neural networks (CNNs) and transformer. Built upon the TransUNet architecture, our approach introduces two novel modules: the multi‐group contextual attention (MDGA) module and the multi‐scale dilated aggregation (MSDAM) module. The MDGA module enhances feature extraction across different dimensions by facilitating the interaction and fusion of multiple contextual information groups. Meanwhile, the MSDAM module optimizes feature fusion in skip connections by integrating multi‐scale dilated convolutions with global feature aggregation. For evaluation, we conduct extensive experiments on four data sets: Left Atrial Appendage and Pulmonary Vein CT(LAA & PV CT), ISIC‐2018, Chest X‐ray, and COVID‐19 CT scans. A series of ablation studies are designed to validate the effectiveness of individual components within the proposed framework. Experimental results demonstrate that MSFIF‐Net achieves superior segmentation performance compared to existing models across five quantitative metrics, effectively addressing the challenge of low segmentation accuracy in small regions within medical image analysis.
Wang et al. (Mon,) studied this question.