September 23, 2024Open Access

Hierarchical Residual Attention Network for Musical Instrument Recognition Using Scaled Multi-Spectrogram

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial-temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. We propose a hierarchical residual attention network using a scaled combination of multiple spectrograms, including STFT, Log-Mel, MFCC, and CST features (chroma, spectral contrast, and Tonnetz), to create a comprehensive sound representation. This model enhances focus on relevant spectrogram parts through attention mechanisms. Experimental results with the OpenMIC-2018 dataset show significant improvement in classification accuracy, especially with the Magnified 1/4 Size configuration. Future work will optimize CST feature scaling, explore advanced attention mechanisms, and apply the model to other audio tasks to assess its generalizability.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo