Inspired by human multisensory synergy, multimodal emotion recognition (MER) has advanced human–computer interaction by integrating complementary information from multiple sources. However, multimodal models often suffer from modality imbalance, limits their performance. Existing methods rarely achieve both sufficient unimodal learning and balanced multimodal learning. Even when modality balance is addressed, optimization trajectories among modalities can still impair individual learning. To tackle these issues, we propose Dynamic Reassembly-Fusion (DRFusion), comprises: (1) adaptive fine-grained reassembly to strengthen weak modalities and align gradient directions, and (2) uncertainty-aware fusion for robust multimodal integration. DRFusion both unimodal sufficiency and multimodal balance by selecting weak modalities and performing batch-level reassembly. By explicitly modeling each modality’s predictive uncertainty,effectively handles scenarios with both modality imbalance and insufficiency. Extensive experiments on benchmark datasets show that DRFusion outperforms state-of-the-art multimodal learning methods.
Yu et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: