Abstract As significant carriers of Chinese culture, traditional Chinese painting and traditional Chinese instrumental music embody unique modes of emotional expression. Although cross-modal studies between painting and music have become increasingly common, most remain limited to simple emotional alignment or end-to-end “black-box” matching, without fully integrating the artistic characteristics of Chinese painting—such as brushwork, composition, and the use of negative space—with the distinctive features of traditional instrumental music, including qikou (breath phrasing) and scale structures. To address this gap, this article proposes improved emotion recognition models for Chinese painting and traditional instrumental music, respectively, enhancing emotional classification performance in the visual and auditory modalities. Building on this, we introduce a Painting-Audio Multimodal Matching (PAMM) framework based on shared emotional and structural attributes. This framework incorporates theories from psychology, esthetics, and human perceptual cognition, enabling synesthetic matching through emotion recognition and cross-modal feature analysis. Experimental results show that our Chinese painting emotion recognition model achieves an accuracy of 89.1 per cent on a self-constructed dataset, outperforming existing approaches such as DenseNet, ResNet, and the unoptimized ConvNeXt model (by 5 per cent). The music emotion recognition model also significantly surpasses mainstream RNN-based methods (e.g. LSTM, BiGRU) and other existing techniques. Case studies of cross-modal matching further validate PAMM’s effectiveness in both emotional and structural dimensions, offering a novel perspective for the digital preservation and cross-modal exploration of traditional Chinese art.
Ge et al. (Wed,) studied this question.