Los puntos clave no están disponibles para este artículo en este momento.
Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial-temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. We propose a hierarchical residual attention network using a scaled combination of multiple spectrograms, including STFT, Log-Mel, MFCC, and CST features (chroma, spectral contrast, and Tonnetz), to create a comprehensive sound representation. This model enhances focus on relevant spectrogram parts through attention mechanisms. Experimental results with the OpenMIC-2018 dataset show significant improvement in classification accuracy, especially with the Magnified 1/4 Size configuration. Future work will optimize CST feature scaling, explore advanced attention mechanisms, and apply the model to other audio tasks to assess its generalizability.
Chen et al. (Mon,) studied this question.