Semantic segmentation in urban scenes is an important task in computer vision. However, urban road scenes still present many challenges, such as category imbalance and complex backgrounds. These problems lead to unclear edge segmentation and inaccurate classification of occluded objects in existing semantic segmentation methods for urban scenes, which limits their accuracy and robustness in practical applications. In this paper, we propose a model that recursively enhances edge feature representation while incorporating local spatial context. To address the problem of unclear edge segmentation, we introduce Multi-scale Central Difference Convolution (MS-CDC) to fuse multi-scale edge features. The feature pyramid-based FeedBack Connection (FBC) module fuses multi-scale features while recursively enhancing the original network, thereby improving the robustness of the model to occluded objects. Meanwhile, we design a Local Feature Extraction (LFE) module to capture pixel-wise relationships by constructing local pixel graphs and center pixel graphs. It can learn local contextual information to extract finer-grained pixel features. Experimental results on the Cityscapes and Mapillary Vista datasets validate the effectiveness of the proposed model. Our model achieves 80.67% and 45.5% mIoU on the validation sets of Cityscapes and Mapillary Vista, respectively. We open-source our code at https://github.com/sanmanaa/segmentation-autodriving-graph-centralconv .
Wáng et al. (Fri,) studied this question.