What does this research mean for the field?

The Traffic Sound Event Detection Convolutional Neural Network (TSED-CNN) achieves an accuracy of 96.378% in detecting traffic sounds, improving upon the baseline model by 1.705%. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to enhance traffic sound event detection using auditory perception in intelligent vehicles.

March 3, 2026Open Access

Research on Traffic Sound Event Detection Based on Multi-Scale Feature Fusion

Key Points

The aim is to enhance traffic sound event detection using auditory perception in intelligent vehicles.
Developed a traffic sound detection model based on PANNs-CNN10.
Extracted acoustic features of traffic sounds using a filter bank (FBank).
Introduced multi-scale convolution blocks and attention modules in the network architecture.
Achieved an accuracy of 96.378% for traffic sound detection.
Improved baseline model performance by 1.705%.
Enhanced the ability of intelligent vehicles to perceive their surroundings.

Abstract

The perception system of intelligent vehicles is designed to accurately recognize the surrounding traffic environment, which is crucial for achieving safe and efficient autonomous driving. However, it is noted that most existing systems rely solely on visual perception. Auditory perception has been identified as a complementary approach that can address the limitations of visual perception and provide a more reliable basis for decision-making. Sound event detection (SED) is regarded as the core technique for implementing auditory perception. This study proposes a traffic sound event detection model based on PANNs-CNN10. First, acoustic features of traffic sounds are extracted using a Filter Bank (FBank). FBank can retain the frequency domain information of the bandpass filter in its entirety, facilitating the learning of complex feature relationships. Additionally, a kind of multi-scale convolution block is introduced in the intermediate layers of the network to enable it to learn features at different scales and improve the expressiveness of the model. Furthermore, a hybrid multi-scale attention module and a Shuffle Attention module are introduced in the intermediate and deep layers. These modules effectively focus on the correlation of different channels and enhance the network’s ability to capture key features. The improvement resulted in the Traffic Sound Event Detection Convolutional Neural Network (TSED-CNN) model. The TSED-CNN achieved an accuracy of 96.378% for traffic sound event detection and improved the baseline model by 1.705%. The result shows that the proposed method is able to accurately detect the traffic sounds and further enhance the ability of intelligent vehicles to perceive the surrounding traffic environment.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Zheng et al. (Sat,) studied this question.

synapsesocial.com/papers/69a67eebf353c071a6f0a8c7 https://doi.org/https://doi.org/10.3390/app16052359

Bookmark

View Full Paper