Medical image segmentation still faces three critical challenges: insufficient joint modeling of local details and long-range dependencies, the high computational burden of transformer-based architectures for high-resolution inputs, and performance degradation caused by domain shift across imaging centers and acquisition devices. To address these issues, this paper proposes CMFA-Net, a CNN–Mamba collaborative feature alignment network for robust medical image segmentation. The proposed framework adopts Vision Mamba (VSSM) as the encoder backbone to capture long-range contextual dependencies with linear computational complexity. A CNN–Mamba fusion attention (CMFA) module is designed to integrate the local representation capability of convolution with the long-range modeling capability of Mamba, improving the segmentation of complex boundaries and multi-scale targets. In addition, an enhanced multi-scale context aggregation decoder (EMCAD) is introduced to reduce the semantic gap between encoder and decoder features and strengthen hierarchical feature fusion. To enhance cross-dataset robustness, a contrastive domain alignment learning (cDAL) strategy is applied in the intermediate feature space to learn domain-invariant discriminative representations via an InfoNCE-based objective. Experiments on the CirrMRI600+ pathological liver MRI dataset and several public polyp segmentation benchmarks show that the proposed method achieves competitive segmentation performance. Ablation studies provide empirical evidence for the contributions of the CMFA module, EMCAD decoder, and cDAL mechanism under the same experimental protocol. These results suggest that CMFA-Net is a promising framework for medical image segmentation across heterogeneous datasets.
Yang et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: