What question did this study set out to answer?

This research introduces AMSRA-Net, addressing limitations in CNNs for image classification.

March 15, 2026

AMSRA-Net: An Adaptive Multi-Scale Residual Attention Network for Image Classification

Key Points

This research introduces AMSRA-Net, addressing limitations in CNNs for image classification.
Developed an Adaptive Multi-Scale Residual Attention Network architecture.
Integrated multi-scale features with global self-attention.
Conducted experiments on the CIFAR-10 dataset to evaluate performance.
Achieved 95.89% classification accuracy, surpassing ResNet-18 and Compact Convolutional Transformers.
Showed significant drops in accuracy without the attention mechanism or using single-scale convolution.

Abstract

Although convolutional neural networks (CNNs) have achieved remarkable success in image classification tasks, their inherent limitation of fixed receptive fields restricts their ability to model long-range semantic dependencies. To address this challenge, we propose a novel network architecture, Adaptive Multi-Scale Residual Attention Network (AMSRA-Net), which integrates multi-scale local features with global self-attention mechanisms. AMSRA-Net is composed of four cascaded hierarchical multimodal residual attention blocks (HMRABs), each incorporating a multi-scale feature decoupler (MSFD) and a lightweight gated self-attention engine (LGSA-Engine). The multi-scale feature decoupler employs a channel-splitting strategy to enable parallel extraction of features at different granularities. Building upon this, the gated self-attention engine establishes long-range dependencies across spatial locations via nonlinear transformations, dynamically suppressing redundant background information while enhancing critical semantic features. This results in a deeply synergistic mechanism that combines cross-scale feature interaction with dynamic feature calibration.Experiments conducted on the CIFAR-10 dataset demonstrate that AMSRA-Net achieves a classification accuracy of 95.89%, surpassing baseline models such as ResNet-18 (95.55%) and Compact Convolutional Transformers (CCT, 95.04%), while maintaining lower model complexity. Ablation studies further reveal significant performance drops when removing the gated self-attention engine (down to 89.25%) or degrading the multi-scale feature decoupler to single-scale convolution (down to 88.80%), validating the effectiveness of the proposed dual mechanism of “feature decoupling and dynamic fusion.” This study highlights the efficacy of combining self-attention with multi-scale convolutions and offers a new paradigm for integrating CNNs with global attention mechanisms.

Bookmark

AMSRA-Net: An Adaptive Multi-Scale Residual Attention Network for Image Classification

Key Points

Abstract

Cite This Study

Also Consider

Also Consider