What does this research mean for the field?

ArtFusionNet predicts emotion distributions from artistic paintings more effectively than existing models. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.ESTABLISHES_NEW_DIRECTION.

What question did this study set out to answer?

The aim is to predict emotional responses to artistic images using a novel deep learning framework.

March 14, 2026

ArtFusionNet: Multilevel Feature Fusion for Emotion DistributionPrediction in Artistic Images

Key Points

The aim is to predict emotional responses to artistic images using a novel deep learning framework.
Developed ArtFusionNet to integrate multi-level visual features
Fused perceptual features with high-level semantic features
Used stylistic embeddings from discrete style labels
Predicted a 9-dimensional emotion distribution vector
ArtFusionNet significantly outperformed several baseline models
Ablation studies highlighted the importance of low-level texture cues
The model enhanced sensitivity to subtle emotional nuances in art

Abstract

Introduction: Predicting viewers’ emotional responses to visual stimuli has become a central topic in affective computing, with numerous deep learning models proposed for estimating emotion distributions from general images. However, artistic images present a unique challenge due to their abstract nature, rich stylistic variance, and the deeply subjective emotional reactions they evoke. Existing efforts largely focus on natural or photographic imagery, leaving the prediction of emotion distributions in artistic paintings underexplored. Methods: To bridge this gap, we propose ArtFusionNet, a novel multimodal deep learning framework designed to predict emotion distributions from artistic paintings. Our framework integrates and fuses multi-level and spatially aligned visual features to form a comprehensive emotional representation. Specifically, it combines perceptual features and high-level semantic features with artistic style embeddings derived from discrete style labels. These heterogeneous features are unified into a single representation used to predict a 9-dimensional emotion distribution vector Results: Experimental results on the ArtEmis dataset demonstrate that ArtFusionNet significantly outperforms several strong baselines. Ablation studies further reveal the complementary roles of low-level texture cues and stylistic context in enhancing sensitivity to subtle emotional nuances in art Discussion: Our approach introduces a spatially aware and stylistically enriched pathway for affective understanding of visual art, contributing novel insights to the relatively underrepresented domain of emotion distribution prediction for artistic imagery Conclusion: In this paper, we proposed ArtFusionNet, a novel deep learning framework that uniquely integrates multi-modal features to predict subjective emotion distributions from artistic paintings. By leveraging global semantics from a visual transformer, local perceptual details from a VAE encoder, and contextual priors from learnable style embeddings, our model effectively captures the complexity and subjectivity of art-evoked emotions.

Bookmark

Cite This Study

Li et al. (Tue,) studied this question.

synapsesocial.com/papers/69b4fbc1b39f7826a300c1ca https://doi.org/https://doi.org/10.2174/0115748936424032251124052456

Bookmark