April 2, 2024

Precision in Audio: CNN+LSTM-Based 3D Sound Event Localization and Detection in Real-world Environments

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

The task of Sound event localization and detection (SELD) is one of the emerging and key areas that will help shape the machines of the future. It amalgamates the problems of Sound event detection (SED) and Direction of arrival (DOA). The paper proposes a convolution recurrent neural network (CRNN) with Long Short-Term Memory (LSTM) based solution for spatial localization and classification of the sound source. The convolutional layers help in extracting spatial features while the LSTM layers capture the long-range temporal dependencies of the audio data. The inclusion of techniques like attention mechanism, normalization, pooling, and dropout-regularization results in an effective SELD model. The model is then evaluated on SELD score, which is a function of SED and DOA scores. The results suggest the promising ability of proposed architecture on localizing and distinguishing from spatially and semantically different sound sources in the environment.

Precision in Audio: CNN+LSTM-Based 3D Sound Event Localization and Detection in Real-world Environments

Puntos clave

Resumen

Cite This Study