What question did this study set out to answer?

The aim is to improve automated aesthetic evaluation using a unified multimodal framework that integrates quantitative and qualitative data.

January 25, 2026Open Access

Unifying Aesthetic Evaluation via Multimodal Annotation and Fine-Grained Sentiment Analysis

Key Points

The aim is to improve automated aesthetic evaluation using a unified multimodal framework that integrates quantitative and qualitative data.
Developed the Textual Aesthetic Sentiment Labeling Pipeline for automatic annotation.
Constructed the Reddit Multimodal Sentiment Dataset with paired aesthetic scores and descriptions.
Introduced the Aesthetic Category Sentiment Analysis task to model aesthetic attributes across different modalities.
Designed models LAGA and ACSFM to enhance evaluation consistency and interpretability.
The framework successfully addresses data limitations in aesthetic evaluation.
Competitive performance observed on public benchmarks and the constructed dataset.
Fine-grained sentiment modeling improves overall aesthetic assessment capabilities.

Abstract

With the rapid growth of visual content, automated aesthetic evaluation has become increasingly important. However, existing research faces three key challenges: (1) the absence of datasets combining Image Aesthetic Assessment (IAA) scores and Image Aesthetic Captioning (IAC) descriptions; (2) limited integration of quantitative scores and qualitative text, hindering comprehensive modeling; (3) the subjective nature of aesthetics, which complicates consistent fine-grained evaluation. To tackle these issues, we propose a unified multimodal framework. To address the lack of data, we develop the Textual Aesthetic Sentiment Labeling Pipeline (TASLP) for automatic annotation and construct the Reddit Multimodal Sentiment Dataset (RMSD) with paired IAA and IAC labels. To improve annotation integration, we introduce the Aesthetic Category Sentiment Analysis (ACSA) task, which models fine-grained aesthetic attributes across modalities. To handle subjectivity, we design two models—LAGA for IAA and ACSFM for IAC—that leverage ACSA features to enhance consistency and interpretability. Experiments on RMSD and public benchmarks show that our approach alleviates data limitations and delivers competitive performance, highlighting the effectiveness of fine-grained sentiment modeling and multimodal learning in aesthetic evaluation.

Unifying Aesthetic Evaluation via Multimodal Annotation and Fine-Grained Sentiment Analysis

Key Points

Abstract

Cite This Study