What question did this study set out to answer?

This research aims to enhance multimodal sentiment analysis by addressing feature fusion and cross-modal relationship challenges.

June 13, 2026Open Access

A modality-aware contrastive learning framework for multimodal sentiment analysis

Key Points

This research aims to enhance multimodal sentiment analysis by addressing feature fusion and cross-modal relationship challenges.
Proposed the Multimodal-Aware Contrastive Learning (MACL) framework.
Implemented a Dynamic Multi-Scale Attention (DMSA) mechanism for feature extraction.
Validated performance on CMU-MOSI and CMU-MOSEI datasets.
MACL outperformed existing methods, achieving superior recognition accuracy.
Robustness and enhanced generalization were evidenced by comparative analysis.
The model effectively captured subtle emotional cues across modalities.

Abstract

Multimodal sentiment analysis (MSA) is essential for human-computer interaction. It combines textual, visual, and acoustic signals to improve the precision of emotion recognition. Despite recent advances, current methods still struggle with fine-grained feature fusion and insufficient modelling of complex cross-modal relationships. To address these challenges, we propose a novel Multimodal-Aware Contrastive Learning (MACL) framework. MACL proposes a Dynamic Multi-Scale Attention (DMSA) mechanism that adaptively captures multi-level temporal and spatial features within each modality, thereby enhancing the fidelity of intra-modal feature representations and improving sensitivity to subtle emotional cues. MACL incorporates a Modality-Aware Representation Learning (MARL) module that jointly learns both modality-shared and modality-specific representations, enabling the model to preserve fine-grained local details while aligning global semantic information across heterogeneous modalities. Furthermore, an Information Noise-Contrastive Estimation (InfoNCE)-based contrastive learning strategy is incorporated to maintain semantic consistency. Experimental results on the benchmark CMU Multimodal Opinion-Level Sentiment Intensity(CMU-MOSI) and CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) datasets demonstrate that MACL consistently outperforms existing state-of-the-art approaches, validating its robustness and superior generalization.

A modality-aware contrastive learning framework for multimodal sentiment analysis

Key Points

Abstract

Cite This Study

Also Consider

Also Consider