What question did this study set out to answer?

The central aim is to develop a method for accurately segmenting obstacles on roadside traffic facilities to improve automatic cleaning vehicles' efficacy.

April 22, 2026Open Access

Roadside Traffic Facility Facade General Obstacle Segmentation Based on Vision Language Model and Similarity Loss Function for Automatic Cleaning Vehicle

Key Points

The central aim is to develop a method for accurately segmenting obstacles on roadside traffic facilities to improve automatic cleaning vehicles' efficacy.
Collected a visual-language obstacle dataset (RGOD) with segmentation masks and language descriptions.
Developed a VLM-GOS model focusing on distinguishing background from foreground in obstacle segmentation.
Evaluated the obstacle segmentation outcomes using various metrics and compared them to existing models.
Achieved a 3% improvement in accuracy over existing segmentation models like MaskFormer and SegFormer.
Enhanced perceptual ability of the automatic cleaning vehicle's model.
The model showed better interpretability in obstacle identification.

Abstract

Tunnels, soundproof screens and other vertical roadside traffic facilities play an important role in isolating the driving environment, maintaining driving safety, and reducing driving noise. As the usage time increases, these facade traffic buildings become polluted and cause traffic safety problems. Obstacles on three-dimensional walls of different shapes, colors, and sizes are the most challenging problem in intelligent cleaning environment perception. This paper proposes an obstacle segmentation method based on a visual language model to overcome these problems. Firstly, in the constructed experimental environment, a visual–language obstacle dataset is collected, named the Road-side General Obstacles Dataset (RGOD), and the collected dataset is labeled with both a segmentation mask and a language description. These preprocessing results are used as the training input of the perception model to obtain the foreground and background separation results. Secondly, a VLM-GOS model was proposed to segmentation special-shaped obstacles, which emphasizes the distinction between background and foreground targets. Finally, the general obstacle is segmented by a vision–language model with a similar loss function, and evaluated with different metrics. Experimental results show that compared with models such as MaskFormer, SegFormer, and ASD-Net, this method improves the model’s perceptual ability and increases accuracy by 3%. More importantly, the model is more interpretable.

Roadside Traffic Facility Facade General Obstacle Segmentation Based on Vision Language Model and Similarity Loss Function for Automatic Cleaning Vehicle

Key Points

Abstract

Cite This Study