What question did this study set out to answer?

The aim is to develop an efficient architecture for classifying medical images while addressing GPU memory limitations and varying image qualities.

June 18, 2026Open Access

An Efficient Frequency-Aware Network with Local-Global Dynamics Fusion for Memory-Constrained Clinical Medical Image Classification

Key Points

The aim is to develop an efficient architecture for classifying medical images while addressing GPU memory limitations and varying image qualities.
Introduced MFGENet, which integrates frequency-spatial hybrid representation for enhanced feature processing.
Utilized a wavelet-based stem module and a Global Dynamic Enhanced Block for capturing long-range dependencies.
Implemented a Multi-scale Fusion Attention Module to reduce computational complexity while fusing features.
MFGENet achieves significantly higher accuracy compared to lightweight models, with lower GPU memory use.
Reduction in GPU memory consumption by up to 62% while maintaining performance equal to high-accuracy models.
Demonstrated effective balance among sensitivity, context modeling, and resource efficiency across 16 medical imaging datasets.

Abstract

Convolutional networks, Transformers, Mamba-based state space models, and their hybrid variants have shown promise in medical image classification, yet they struggle with real-world clinical challenges such as heterogeneous imaging quality, texture-rich anatomical structures, and edge ambiguity in low-resolution features. To address these challenges, including the bottleneck of Graphics Processing Unit (GPU) memory consumption in processing high-resolution medical images, we propose MFGENet (Multi-scale Fusion Global Enhancement Network), a novel architecture integrating frequency-spatial hybrid representation and efficient local-global context modeling. First, a wavelet-based stem module replaces conventional downsampling, decomposing features via Haar transform into multi-frequency components. This preserves critical edge and texture details in high-frequency maps while using low-frequency semantics to generate adaptive gating controls, significantly mitigating edge blurring. Second, our Global Dynamic Enhanced Block (GDE Block) incorporates a parallel enhancement subnetwork, which employs group-wise processing with parallel dilated convolution and spatial-channel attention paths to capture long-range dependencies while maintaining computational efficiency. Since different medical images have varying features of the lesion and Region of Interest (ROI) focus scales, with even differing numbers of ROI lesion features, we also designed a Multi-Path Dynamic Convolutional Residual Fusion (MPDConv) that dynamically adjusts convolution layer counts and kernel sizes to capture image diversity and multi-scale features, enhancing the network’s adaptability to different medical images. Third, a Multi-scale Fusion Attention Module (MFA Module) introduces an additive similarity function with multi-kernel depthwise convolutions, reducing quadratic complexity O ( N 2 ) to linear complexity O ( N ) while fusing cross-scale features. Compared to lightweight models (e.g., EfficientNet-B3, ConvNeXt-T), MFGENet achieves significantly higher accuracy while maintaining comparable or lower GPU memory consumption. When evaluated against high-accuracy models (such as MedViT and MedMamba), MFGENet significantly reduces GPU memory consumption by up to 62% while maintaining identical performance levels. We conducted comparisons across 16 medical imaging datasets, and the results demonstrate that MFGENet’s design enables an effective balance among structural sensitivity, local-global context modeling, and resource efficiency, making it well-suited for memory-constrained clinical applications.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper