Key points are not available for this paper at this time.
In dynamic SLAM (Simultaneous Localization and Mapping), machine vision needs to understand and closely align with human cognitive semantics to overcome the interference from moving objects. DeepLabV3+ is a mainstream semantic segmentation algorithm that balances accuracy and speed. However, DeepLabV3+ does not differentiate the weights of various feature layers, does not address the issue of sample imbalance, and has a large parameter count in its backbone network. To tackle these issues, this paper proposes a method that introduces an attention mechanism during the fusion of the algorithm's multi-scale feature information, emphasizing important information and enhancing the ability to recover boundaries. A new lightweight extraction network is used as the backbone, and a more appropriate loss function is employed to balance the segmentation targets, thereby improving the final segmentation results. Experimental results show that while the mean Intersection Over Union (mIOU) on the PASCAL VOC 2007 dataset decreases by about 5 percentage points, the model's parameters are significantly reduced by about 89%. This reduction in parameters maintains the accuracy of feature extraction and significantly improves object segmentation performance in dynamic scenes on mobile devices.
尹 et al. (Thu,) studied this question.