Introduction: Predicting viewers’ emotional responses to visual stimuli has become a central topic in affective computing, with numerous deep learning models proposed for estimating emotion distributions from general images. However, artistic images present a unique challenge due to their abstract nature, rich stylistic variance, and the deeply subjective emotional reactions they evoke. Existing efforts largely focus on natural or photographic imagery, leaving the prediction of emotion distributions in artistic paintings underexplored. Methods: To bridge this gap, we propose ArtFusionNet, a novel multimodal deep learning framework designed to predict emotion distributions from artistic paintings. Our framework integrates and fuses multi-level and spatially aligned visual features to form a comprehensive emotional representation. Specifically, it combines perceptual features and high-level semantic features with artistic style embeddings derived from discrete style labels. These heterogeneous features are unified into a single representation used to predict a 9-dimensional emotion distribution vector Results: Experimental results on the ArtEmis dataset demonstrate that ArtFusionNet significantly outperforms several strong baselines. Ablation studies further reveal the complementary roles of low-level texture cues and stylistic context in enhancing sensitivity to subtle emotional nuances in art Discussion: Our approach introduces a spatially aware and stylistically enriched pathway for affective understanding of visual art, contributing novel insights to the relatively underrepresented domain of emotion distribution prediction for artistic imagery Conclusion: In this paper, we proposed ArtFusionNet, a novel deep learning framework that uniquely integrates multi-modal features to predict subjective emotion distributions from artistic paintings. By leveraging global semantics from a visual transformer, local perceptual details from a VAE encoder, and contextual priors from learnable style embeddings, our model effectively captures the complexity and subjectivity of art-evoked emotions.
Li et al. (Tue,) studied this question.