The perception system of intelligent vehicles is designed to accurately recognize the surrounding traffic environment, which is crucial for achieving safe and efficient autonomous driving. However, it is noted that most existing systems rely solely on visual perception. Auditory perception has been identified as a complementary approach that can address the limitations of visual perception and provide a more reliable basis for decision-making. Sound event detection (SED) is regarded as the core technique for implementing auditory perception. This study proposes a traffic sound event detection model based on PANNs-CNN10. First, acoustic features of traffic sounds are extracted using a Filter Bank (FBank). FBank can retain the frequency domain information of the bandpass filter in its entirety, facilitating the learning of complex feature relationships. Additionally, a kind of multi-scale convolution block is introduced in the intermediate layers of the network to enable it to learn features at different scales and improve the expressiveness of the model. Furthermore, a hybrid multi-scale attention module and a Shuffle Attention module are introduced in the intermediate and deep layers. These modules effectively focus on the correlation of different channels and enhance the network’s ability to capture key features. The improvement resulted in the Traffic Sound Event Detection Convolutional Neural Network (TSED-CNN) model. The TSED-CNN achieved an accuracy of 96.378% for traffic sound event detection and improved the baseline model by 1.705%. The result shows that the proposed method is able to accurately detect the traffic sounds and further enhance the ability of intelligent vehicles to perceive the surrounding traffic environment.
Zheng et al. (Sat,) studied this question.