We introduce a novel hybrid deep learning module, termed the Mamba-Spatial-Temporal Generator (MSTG), which integrates the strengths of Convolutional Neural Networks (CNNs) with the advanced Mamba architecture. While conventional CNNs are effective in extracting local features within diffusion models, their limited receptive field restricts their capacity to capture long-range dependencies. To overcome this limitation, MSTG first employs CNN-based convolutional and pooling layers to extract multi-level local features, and subsequently incorporates Mamba blocks founded on State Space Models (SSMs). Owing to its linear computational complexity and powerful long-sequence modeling capability, Mamba adaptively selects and fuses global contextual information. Through this synergistic design, MSTG retains the local perceptual advantages of CNNs while simultaneously leveraging the global dynamic modeling capacity of Mamba. As a result, it significantly improves the understanding of complex spatial and sequential dependencies without compromising computational efficiency. This module has a clear structure and good scalability, providing a new and effective way to improve the performance of cardiac medical image generation tasks for 4D data.
Zhang et al. (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: