What question did this study set out to answer?

This research aims to develop a low-complexity pooling strategy to enhance acoustic scene classification performance using convolutional neural networks.

May 1, 2026

Multi-scale pyramid pooling with low complexity for acoustic scene classification.

Key Points

This research aims to develop a low-complexity pooling strategy to enhance acoustic scene classification performance using convolutional neural networks.
Implemented multi-scale pyramid pooling across convolutional layers of varying depths.
Analyzed correlations among local feature maps with different time-frequency details.
Evaluated performance on DCASE 2019 and DCASE 2020 datasets.
MSP modules improved baseline CNN performance by 5.26% on DCASE 2019 and 4.38% on DCASE 2020 datasets.
Only 4.99k additional parameters were used for these improvements.

Abstract

Acoustic scene classification (ASC) focuses on recognizing and characterizing acoustic environments. Convolutional neural networks (CNNs) are extensively employed in ASC due to their capability to learn local time-frequency information from spectrograms. High-performance CNNs typically necessitate a substantial quantity of parameters; however, current ASC systems are predominantly deployed on lightweight devices, which restricts the efficacy of high-performance CNNs under low-parameter constraints. This paper presents a low-complexity multi-scale pyramid pooling (MSP) strategy for CNNs, implemented across convolutional layers of varying depths, to enhance the performance of baseline CNNs under limited parameter constraints. Specifically, MSP analyzes the contribution of various sound events to specific scenes by capturing the correlation information among local feature maps with varying time-frequency details. Experimental results on multiple ASC datasets demonstrate that MSP modules considerably improve the performance of baseline CNNs, with only 4.99k additional parameters yielding performance improvements of 5.26% and 4.38% on the DCASE 2019, and DCASE 2020 datasets, respectively. These results demonstrate that the proposed MSP module can effectively improve the performance of resource-constrained ASC systems, and has potential applications in real-world scenarios such as intelligent surveillance, smart wearable devices, and edge-based audio monitoring systems.

Bookmark

View Full Paper

Bookmark

View Full Paper

Multi-scale pyramid pooling with low complexity for acoustic scene classification.

Key Points

Abstract

Cite This Study