What question did this study set out to answer?

The aim is to develop a framework that integrates emotional and structural features of traditional Chinese painting and music for improved matching.

April 20, 2026

A multimodal matching method for traditional Chinese painting and instrumental music integrating emotional and structural features

Key Points

The aim is to develop a framework that integrates emotional and structural features of traditional Chinese painting and music for improved matching.
Introduces a Painting-Audio Multimodal Matching (PAMM) framework for emotion recognition.
Implements enhanced emotion recognition models for both painting and instrumental music.
Conducts experimental validation using a self-constructed dataset.
Achieves 89.1% accuracy in emotion recognition for Chinese painting, outperforming existing models.
Music emotion recognition model significantly exceeds mainstream RNN-based approaches.
Validates PAMM's effectiveness through case studies in emotional and structural dimensions.

Abstract

Abstract As significant carriers of Chinese culture, traditional Chinese painting and traditional Chinese instrumental music embody unique modes of emotional expression. Although cross-modal studies between painting and music have become increasingly common, most remain limited to simple emotional alignment or end-to-end “black-box” matching, without fully integrating the artistic characteristics of Chinese painting—such as brushwork, composition, and the use of negative space—with the distinctive features of traditional instrumental music, including qikou (breath phrasing) and scale structures. To address this gap, this article proposes improved emotion recognition models for Chinese painting and traditional instrumental music, respectively, enhancing emotional classification performance in the visual and auditory modalities. Building on this, we introduce a Painting-Audio Multimodal Matching (PAMM) framework based on shared emotional and structural attributes. This framework incorporates theories from psychology, esthetics, and human perceptual cognition, enabling synesthetic matching through emotion recognition and cross-modal feature analysis. Experimental results show that our Chinese painting emotion recognition model achieves an accuracy of 89.1 per cent on a self-constructed dataset, outperforming existing approaches such as DenseNet, ResNet, and the unoptimized ConvNeXt model (by 5 per cent). The music emotion recognition model also significantly surpasses mainstream RNN-based methods (e.g. LSTM, BiGRU) and other existing techniques. Case studies of cross-modal matching further validate PAMM’s effectiveness in both emotional and structural dimensions, offering a novel perspective for the digital preservation and cross-modal exploration of traditional Chinese art.

Mark Helpful

Bookmark

Relay