Abstract In medical image segmentation, U-Net and its variants are widely utilized. However, CNN-based models are limited in long-range dependency modeling, while Transformer-based models are constrained by quadratic computational complexity. Recently, state space models (SSMs) such as Mamba have emerged as a promising solution, as they excel at modeling long-range interactions while maintaining linear computational complexity. This paper proposes the CIHM, a hybrid Mamba model that provides contextual insights. Its key component is the context-insight Mamba-CNN layer (CiMC layer). The CiMC layer adopts a hybrid Mamba-CNN approach to process features across different scales and levels in parallel, thereby enhancing the semantic segmentation of medical images. Specifically, the SSM branch of this layer captures long-range dependencies with comprehensive contextual semantic understanding, while the CNN block achieves superior local feature extraction using T-shaped convolution that focuses on patch centers. Moreover, to further aggregate multi-scale contextual information of medical images, we design the multiscale refining detail bridge (MRDB) module. This module significantly improves feature representation through the dense multiplication and concatenation of features for fusion. Comprehensive experiments on the ISIC2017, ISIC2018, Synapse, and DSB18 datasets demonstrate that CIHM exhibits strong competitive performance and efficiency in medical image segmentation tasks. Notably, CIHM achieves better segmentation performance than the well-known U-Net and the modern hybrid network U-Mamba, while reducing the parameter count by 40 times and 345 times, respectively. This demonstrates the potential of our approach to advance model lightweighting.
Ma et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: