What question did this study set out to answer?

The aim is to improve emotion recognition in artworks by integrating visual and textual information.

June 20, 2026Open Access

A multimodal adaptive deep network for emotion recognition in artistic images

Key Points

The aim is to improve emotion recognition in artworks by integrating visual and textual information.
Proposed a multimodal adaptive emotion recognition network with a gated adaptive fusion module.
Introduced an emotion-aware contrastive learning pre-training strategy for aligning cross-modal features.
Conducted experiments on the ArtEmis dataset with reported accuracy of 71.3%.
Achieved 71.3% accuracy, surpassing state-of-the-art methods by 2.4 percentage points.
Confirmed the effectiveness and interpretability of components through ablation and case studies.

Abstract

Recognising emotions in artworks is essential for digital galleries, personalised art recommendations, and art education.However, this task is challenging due to the abstract nature of images and subjective viewer interpretations, and existing methods often inadequately integrate visual content with textual descriptions.To address this issue, this paper proposes a multimodal adaptive emotion recognition network grounded in appraisal theory, featuring a gated adaptive fusion module that dynamically balances image and text contributions.An emotion-aware contrastive learning pre-training strategy is introduced to align cross-modal features.Experiments on the ArtEmis dataset show our method achieves 71.3% accuracy, surpassing state-of-the-art baselines by 2.4 percentage points.Ablation and case studies confirm the effectiveness and interpretability of each component.This work offers a promising solution for emotion understanding in art with demonstrable practical potential in controlled settings.

Mark Helpful

Bookmark

Relay

View Full Paper