What question did this study set out to answer?

This research aims to develop an efficient image semantic segmentation model that integrates local and global feature extraction.

April 23, 2026Open Access

Multi-Scale Context-Aware Network Implementation for Efficient Image Semantic Segmentation

Key Points

This research aims to develop an efficient image semantic segmentation model that integrates local and global feature extraction.
Proposed a Multi-Scale Context-aware Network (MSC-Net) within an encoder-decoder framework.
Combined a convolutional neural network backbone with a Multi-Scale Self-Attention module.
Conducted experiments comparing MSC-Net to SegFormer and DeepLabV3+ for performance evaluation.
Achieved 38.8% mean Intersection over Union (mIoU) and 98.4% accuracy (ACC).
Improved mIoU by approximately +3.0 and +3.3 percentage points compared to SegFormer and DeepLabV3+, respectively.
Reduced computational complexity (FLOPs) and parameter size while maintaining performance.

Abstract

Image semantic segmentation is essential in autonomous driving, medical imaging, and remote sensing. While convolutional neural networks (CNNs) excel at local feature extraction and spatial structure modeling, their limited receptive fields restrict the capture of long-range dependencies and global semantic consistency. Transformers provide strong global modeling through self-attention but often lack local inductive bias and show weaker generalization on small datasets. To address these limitations, this paper proposes a Multi-Scale Context-aware Network (MSC-Net) for image semantic segmentation. Under an encoder–decoder framework, MSC-Net combines a convolutional backbone with a Multi-Scale Self-Attention module to integrate the complementary strengths of CNNs and attention mechanisms. The backbone extracts local texture and structural information and can adopt architectures such as MobileNet, Xception, DRN, and ResNet, while the attention module captures long-range dependencies and multi-scale contextual information. This design improves cross-layer feature collaboration, multi-scale feature fusion, and boundary quality while maintaining computational efficiency. Experimental results show that MSC-Net achieves 38.8% mIoU and 98.4% ACC under comparable computational settings. Compared with SegFormer and DeepLabV3+, the model improves mIoU by approximately +3.0 and +3.3 percentage points, respectively, while reducing FLOPs and parameter size.

Multi-Scale Context-Aware Network Implementation for Efficient Image Semantic Segmentation

Key Points

Abstract

Cite This Study