Emotion classification in music is an evolving domain within affective computing that holds significant promise for applications in music therapy, personalized media experiences, and adaptive systems. This study proposes a comprehensive machine learning framework for multi-label emotion recognition in music, leveraging the publicly available Emotify dataset. The framework incorporates acoustic features, listener metadata (e.g., mood, age, gender), and genre classification to enhance predictive accuracy. We conducted six experimental modeling phases using Random Forest, Multi-Layer Perceptron (MLP), XGBoost, and their ensemble variants. The final model—a stacking ensemble combining the three base learners with a logistic regression meta-classifier—achieved superior results with a subset accuracy of 0.41, Hamming loss of 0.25, and macro-averaged F1 score of 0.67. Comparative analysis with recent studies indicates that our approach achieves higher macro-F1 performance than prior clip-level, multi-label methods evaluated on the Emotify dataset under comparable settings. These results highlight the critical role of ensemble strategies and multi-modal feature fusion in modeling complex emotional landscapes. The proposed model not only advances the state-of-the-art but also supports the development of emotionally adaptive systems and highlights promising potential for future music-therapy applications, subject to further usability and clinical-effectiveness evaluation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jing Wu
International Journal of Computational Intelligence Systems
Hunan University of Traditional Chinese Medicine
Building similarity graph...
Analyzing shared references across papers
Loading...
Jing Wu (Sun,) studied this question.
www.synapsesocial.com/papers/69f04e7d727298f751e72734 — DOI: https://doi.org/10.1007/s44196-026-01304-0