What type of study is this?

This is a Quantitative Study study.

September 12, 2025Open Access

Deep Temporal Features and Multi-Level Cross-Modal Attention Fusion for Multimodal Sentiment Analysis

Key Points

The proposed method enhances sentiment analysis by optimizing multimodal feature extraction and attention mechanisms.
Experiments on the CMU-MOSI and CMU-MOSEI datasets show significant improvements in correlation and accuracy.
Deep temporal features are extracted using bidirectional LSTMs and attention models to capture diverse information.
A collaborative loss function aligns cross-modal features, further boosting the efficiency of multimodal sentiment analysis.

Abstract

Abstract To address the challenges of insufficient multimodal feature extraction and limited cross-modal semantic diversity and interaction in multimodal sentiment analysis, this paper introduces Deep Temporal Features and Multi-Level Cross-Modal Attention Fusion (DTMCAF). Initially, a deep temporal feature extractor is developed, creating a multimodal temporal modeling network that combines bidirectional LSTMs with multi-head self-attention to capture multimodal features. Next, hierarchical cross-modal attention mechanisms along with feature-enhancement attention modules are designed to facilitate thorough information exchange between different modalities. Additionally, gated fusion and multi-layer feature transformations are employed to strengthen multimodal representations. Lastly, a multi-component collaborative loss function is proposed to align cross-modal features and optimize sentiment representations. Comprehensive experiments conducted on the CMU-MOSI and CMU-MOSEI datasets demonstrate that the proposed method outperforms current state-of-the-art techniques in terms of correlation, accuracy, and F1 score, significantly enhancing the precision of multimodal sentiment analysis.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Min Zhu

Shanghai Jiao Tong University

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Deep Temporal Features and Multi-Level Cross-Modal Attention Fusion for Multimodal Sentiment Analysis

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study