Deep learning based convolutional neural networks (CNNs) and transformers are widely used in medical image processing tasks, while the state space sequence model (SMM) architecture is proposed to address its limitations in improving the scaling efficiency and solving the transformed quadratic scale problem. Inspired by the Mamba architecture, this paper proposes dual-attention vision scaled-UNet (DAVS-UNet) for medical image segmentation, in which adaptive multi-scale selection (AMS) is applied to the input image for better capturing details at different scales and extracting input features. Furthermore, atrous space pyramid pooling (ASPP) is introduced to expand the sensory field by collecting global contextual information after the final encoder. The experiments on a large number of publicly available datasets illustrate that DAVs-UNet shows excellent performance on the ISIC2017, ISIC2018, Synapse datasets, and outperforms existing SSM-architecture networks employed in medical image segmentation tasks. The code is available at https://github.com/zhzhuac/DAVS.
Zihui Zhu (Thu,) studied this question.