What question did this study set out to answer?

This research aims to improve the detection of multiple pitches from various instruments in music signals using deep learning techniques.

May 14, 2026

A study on multi-instrument pitch detection based on multi-pitch estimation and instrument recognition

Key Points

This research aims to improve the detection of multiple pitches from various instruments in music signals using deep learning techniques.
Developed a baseline multi-instrument multi-pitch estimation (MI-MPE) model without auxiliary inputs.
Compared MI-MPE performance with dedicated multi-pitch estimation (MPE) and instrument recognition (IR) models.
Incorporated outputs from MPE and IR models into a new MI-MPE model for enhanced modeling.
The baseline MI-MPE model showed lower recall and average precision compared to the MPE and IR models.
The new MI-MPE model incorporating MPE and IR outputs exhibited improved precision metrics.
Challenges remain in detecting certain pitches and instruments effectively.

Abstract

In practical automatic music transcription, detecting multiple pitches for each of multiple instruments within a music signal is essential. This task is addressed by multi-instrument multi-pitch estimation (MI-MPE). Our approach involves constructing a deep learning model for MI-MPE that leveragesthe outputs from two models: the MPE model performing instrument-agnostic multi-pitch estimation and the IR model performing pitch-agnosticframe-level instrument recognition. First, as a preliminary experiment, we developed a baseline MI-MPE model that does not incorporate MPE or IR outputs and compared its performance with that of the MPE and IR models. All three models share similar architectures consisting of convolutional layers and Transformer blocks. For comparison, the MI-MPE outputs are projected into the MPE or IR formats by applying a max operation along the instrument or pitch dimension, respectively. Experimental results showed that the projected outputs of the MI-MPE model exhibit comparable or higher precision, but significantly lower recall and lower average precision compared to the MPE and IR models. These results suggest that the baseline MI-MPE model has difficulty detecting certain pitches and instruments. To address this, we constructed a new MI-MPE model incorporating outputs from the MPE and IR models as auxiliary inputs and evaluated its performance.

Mark Helpful

Bookmark

Relay