August 12, 2025

Multimodal Fusion of Behavioral and Physiological Signals for Enhanced Emotion Recognition Via Feature Decoupling and Knowledge Transfer

Key Points

Our method achieves state-of-the-art performance in emotion recognition, confirming significant improvements in predictive accuracy.
Validated across two benchmark datasets, MAHNOB-HCI and DEAP, under varying protocols, showing robustness and flexibility.
Introduced a dual-stream representation learning architecture to separate common and unique features, enhancing model interpretability.
The framework utilizes graph-based distillation to optimize the transfer of knowledge across modalities, improving overall system performance.

Abstract

Multimodal emotion recognition has emerged as a promising direction for capturing the complexity of human affective states by integrating physiological and behavioral signals. However, challenges remain in addressing feature redundancy, modality heterogeneity, and insufficient inter-modal supervision. In this paper, we propose a novel Multimodal Disentangled Knowledge Distillation framework that explicitly disentangles modality-shared and modality-specific features and enhances cross-modal knowledge transfer via a graph-based distillation module. Specifically, we introduce a dual-stream representation learning architecture that separates common and unique subspaces across modalities. To facilitate effective information interaction, we design a directed and learnable modality graph, where each edge represents the semantic transfer strength from one modality to another. We validate our method on two benchmark datasets-MAHNOB-HCI and DEAP-for both regression and classification tasks, under subject-dependent and subject-independent protocols. Experimental results demonstrate that our method achieves state-of-the-art performance, with statistical significance confirmed by paired two-tailed t-tests. In addition, qualitative analysis of the learned modality graph and t-SNE embeddings further illustrates the effectiveness of our feature disentanglement and dynamic knowledge transfer design. This work offers a unified, interpretable, and robust framework for multimodal emotion understanding and lays the foundation for affective computing in real-world human-machine interaction scenarios.

AI에게 질문

Bookmark