What question did this study set out to answer?

This research aims to explore how perceptual similarity is perceived at both part and track levels in music compositions.

May 10, 2026Open Access

Investigation of part-level perceptual music similarity by large-scale listening test

Key Points

This research aims to explore how perceptual similarity is perceived at both part and track levels in music compositions.
Conducted an ABX-style listening test with 632 participants to evaluate music similarity.
Assessed similarity based on timbre, melody, rhythm, and overall impressions.
Analyzed data to understand the influence of instrumental parts on track-level similarity.
Instrumental parts affecting track-level similarity are inconsistent across different music triplets and listeners.
Deep learning model outputs align more closely with timbre evaluations than rhythm when temporal averaging is applied.
Similarity perceived within segments of the same track is significantly higher than between segments from different tracks.

Abstract

This study investigates perceptual similarity at two levels: music tracks (track-level) and the individual instrumental parts that compose them (part-level). A previous work performed a study on perceptual part-level similarity toward developing a model that estimates part-level similarity. An ABX-style listening test with 632 participants was conducted, which evaluated similarity at both levels from the perspectives of timbre, melody, rhythm and overall. Although a previous work contributed some knowledge from the evaluations, further insights are needed to support the development of future estimation models. Specifically, important questions remain regarding the correspondence between track- and part-level similarity, the generalizability of findings across multiple models, and the validity of the conventional learning method in terms of perceptual similarity. This study revealed the following key findings: (1) the instrumental parts that predominantly affect the track-level similarity differ across music triplets and listeners, with the influence of the differences across music triplets exceeding the differences across listeners, indicating that part-level similarity helps in estimating track-level similarity; (2) when a temporal averaging is applied, the output of the deep learning models shows a closer correspondence with the perceptual evaluation based on timbre than on rhythm, indicating a potential area for improvement in the models; (3) the similarity between temporally distinct segments within the same music track is significantly perceived to be significantly higher than that between segments from different tracks, which supports the assumption of the conventional unsupervised learning method developed for music similarity estimation.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Hashizume et al. (Thu,) studied this question.

synapsesocial.com/papers/6a002147c8f74e3340f9c174 https://doi.org/https://doi.org/10.1108/atsip-07-2025-0065

Bookmark

View Full Paper