Based on powerful convolutional neural networks (CNNs) and complex model structures, semantic segmentation achieves good segmentation accuracy, but its slow inference speed limits its use in practical applications, such as autonomous driving and medical diagnosis. Thus, real-time semantic segmentation receives increasing attention. However, most existing real-time semantic segmentation methods improve inference speed while significantly sacrificing segmentation precision. Striking a well balance between inference speed and precision remains a major issue in real-time semantic segmentation. To address this issue, we propose a real-time semantic segmentation network, the Multi-Shape Enhancement Pyramid Network (MSEPNet). First, we propose an efficient spatial inverted residual (ESIR) module to effectively extract multi-scale spatial information. Next, to capture multi-scale semantic information while maintaining efficient inference speed, we introduce an efficient contextual residual (ECR) module. Finally, we present the multi-shape enhancement pyramid (MSEP) module to capture multi-scale and multi-shape contextual information. The proposed MSEPNet achieves competitive results on street scene datasets. Specifically, with only 1.04 million (1.04M) parameters, it achieves the accuracy of 76.7% and 72.5% mean Intersection over Union (mIoU) with the speed of 144.4 and 108.9 Frames Per Second (FPS) on Cityscapes and Cambridge-driving Labeled Video Database (CamVid) test sets, respectively. Furthermore, we conduct additional experiments on the Stanford Background dataset to verify the robustness of MSEPNet in diverse real-world environments, demonstrating its generalization ability beyond standard benchmarks.
Building similarity graph...
Analyzing shared references across papers
Chen et al. (Thu,) studied this question.
Loading...
Engineering Applications of Artificial Intelligence
Guangzhou University
Add This Paper to Your Research Feed
Any time a new paper drops it will be there.