Analyzing music similarity in large catalogs is challenging because people perceive music differently and important details are found in audio, text, and metadata. This article introduces a multimodal framework that uses an ontology to make music similarity and recommendation more explainable. The framework brings together learned features from audio, lyrics, and other text with structured metadata in a shared similarity space, and then improves ranking with a music ontology that captures relationships between songs, artists, genres, and moods. The design works with any encoder that creates fixed-size features. This study uses strong neural audio and text encoders, mainly based on transformers. This approach allows the system to handle different input types while staying reliable across datasets. This study tests the framework on several open music and audio datasets using content-based retrieval tasks and standard ranking measures. In addition to Configurations C1–C4, this study includes an external content-based reference baseline based on conventional MIR audio descriptors. This baseline represents a signal-level retrieval approach that models complementary aspects of the audio signal, such as timbre, harmony, and spectral characteristics, and is evaluated under the same retrieval protocol as the main framework. It is included to provide an external comparison point outside the proposed C1–C4 design. Compared to audio-only and non-ontological variants within the same framework, the proposed multimodal and ontology-guided configurations achieve better precision, recall, and mean average precision, and also cover more rare content. Visualizations and case studies show that combining different data types and using ontology-based reranking can improve performance and make results easier to interpret. This work lays the groundwork for explainable, cognitively informed music recommendation systems and points to future work in modeling user behavior over time and adapting to different cultures.
Mikhail Rumiantcev (Wed,) studied this question.