What does this research mean for the field?

DRFusion enhances both unimodal sufficiency and multimodal balance in emotion recognition tasks. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

What question did this study set out to answer?

The aim is to improve emotion recognition by addressing challenges in multimodal learning, such as modality imbalance and insufficient learning.

February 21, 2026Open Access

DRFusion: Enhancing balanced and sufficient multimodal learning for human emotion recognition

Key Points

The aim is to improve emotion recognition by addressing challenges in multimodal learning, such as modality imbalance and insufficient learning.
Proposed Dynamic Reassembly-Fusion (DRFusion) method for better integration of multiple information sources.
Utilized adaptive fine-grained reassembly to support weak modalities and align learning gradients.
Implemented uncertainty-aware fusion to enhance robustness in multimodal integration.
DRFusion demonstrates improved performance over existing multimodal learning methods.
Achieved sufficient learning across unimodal inputs while maintaining balance across modalities.
Validated through extensive experiments on benchmark datasets, outclassing the state-of-the-art.

Abstract

Inspired by human multisensory synergy, multimodal emotion recognition (MER) has advanced human–computer interaction by integrating complementary information from multiple sources. However, multimodal models often suffer from modality imbalance, limits their performance. Existing methods rarely achieve both sufficient unimodal learning and balanced multimodal learning. Even when modality balance is addressed, optimization trajectories among modalities can still impair individual learning. To tackle these issues, we propose Dynamic Reassembly-Fusion (DRFusion), comprises: (1) adaptive fine-grained reassembly to strengthen weak modalities and align gradient directions, and (2) uncertainty-aware fusion for robust multimodal integration. DRFusion both unimodal sufficiency and multimodal balance by selecting weak modalities and performing batch-level reassembly. By explicitly modeling each modality’s predictive uncertainty,effectively handles scenarios with both modality imbalance and insufficiency. Extensive experiments on benchmark datasets show that DRFusion outperforms state-of-the-art multimodal learning methods.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper