June 28, 2024Open Access

An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning

Key Points

Key points are not available for this paper at this time.

Abstract

Multimodal sentiment analysis is a technical approach that integrates various modalities to analyze sentiment tendencies or emotional states. Existing challenges encountered by this approach include redundancy in independent modal features and a lack of correlation analysis between different modalities, causing insufficient fusion and degradation of result accuracy. To address these issues, this study proposes an innovative multi-channel multimodal joint learning method for image-text sentiment analysis. First, a multi-channel feature extraction module is introduced to comprehensively capture image or text features. Second, effective interaction of multimodal features is achieved by designing modality-wise interaction modules that eliminate redundant features through cross-modal cross-attention. Last, to consider the complementary role of contextual information in sentiment analysis, an adaptive multi-task fusion method is used to merge single-modal context features with multimodal features for enhancing the reliability of sentiment predictions. Experimental results demonstrate that the proposed method achieves an accuracy of 76.98% and 75.32% on the MVSA-Single and MVSA-Multiple datasets, with F1 scores of 76.23% and 75.29%, respectively, outperforming other state-of-the-art methods. This research provides new insights and methods for advancing multimodal feature fusion, enhancing the accuracy and practicality of sentiment analysis.

An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning

Key Points

Abstract

Cite This Study