The Transformer has achieved great success in the field of medical image segmentation, but its quadratic computational complexity limits its application in dense medical image prediction. Recently, the receptance weighted key value (RWKV) architecture has garnered widespread attention due to its linear computational complexity and its capability of parallel computation during training. Despite the RWKV model's proficiency in addressing long-range modeling tasks with linear computational complexity, most current RWKV-based approaches employ static scanning patterns. These patterns may inadvertently incorporate biased prior knowledge into the model's predictions. To address this challenge, we propose a multi-head scan strategy combined with padding methods to effectively simulate spatial continuity in 2D images. Within the Feature Aggregation Attention (FAA) module, asymmetric convolutions are designed to aggregate 1D sequence features along a single dimension, thereby expanding effective receptive fields while preserving structural sparsity. Additionally, panoramic token shift (P-Shift) effectively models local dependency relationships by moving tokens from a wide receptive field. Extensive experiments conducted on the ISIC17/18 and ACDC datasets demonstrate that our method exhibits superior performance in dense medical image prediction tasks.
A Fri, study studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: