Convolutional networks, Transformers, Mamba-based state space models, and their hybrid variants have shown promise in medical image classification, yet they struggle with real-world clinical challenges such as heterogeneous imaging quality, texture-rich anatomical structures, and edge ambiguity in low-resolution features. To address these challenges, including the bottleneck of Graphics Processing Unit (GPU) memory consumption in processing high-resolution medical images, we propose MFGENet (Multi-scale Fusion Global Enhancement Network), a novel architecture integrating frequency-spatial hybrid representation and efficient local-global context modeling. First, a wavelet-based stem module replaces conventional downsampling, decomposing features via Haar transform into multi-frequency components. This preserves critical edge and texture details in high-frequency maps while using low-frequency semantics to generate adaptive gating controls, significantly mitigating edge blurring. Second, our Global Dynamic Enhanced Block (GDE Block) incorporates a parallel enhancement subnetwork, which employs group-wise processing with parallel dilated convolution and spatial-channel attention paths to capture long-range dependencies while maintaining computational efficiency. Since different medical images have varying features of the lesion and Region of Interest (ROI) focus scales, with even differing numbers of ROI lesion features, we also designed a Multi-Path Dynamic Convolutional Residual Fusion (MPDConv) that dynamically adjusts convolution layer counts and kernel sizes to capture image diversity and multi-scale features, enhancing the network’s adaptability to different medical images. Third, a Multi-scale Fusion Attention Module (MFA Module) introduces an additive similarity function with multi-kernel depthwise convolutions, reducing quadratic complexity O ( N 2 ) to linear complexity O ( N ) while fusing cross-scale features. Compared to lightweight models (e.g., EfficientNet-B3, ConvNeXt-T), MFGENet achieves significantly higher accuracy while maintaining comparable or lower GPU memory consumption. When evaluated against high-accuracy models (such as MedViT and MedMamba), MFGENet significantly reduces GPU memory consumption by up to 62% while maintaining identical performance levels. We conducted comparisons across 16 medical imaging datasets, and the results demonstrate that MFGENet’s design enables an effective balance among structural sensitivity, local-global context modeling, and resource efficiency, making it well-suited for memory-constrained clinical applications.
Yang et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: