What type of study is this?

This is a Quantitative Study study.

October 15, 2025Open Access

Beyond Spurious Cues: Adaptive Multi-Modal Fusion via Mixture-of-Experts for Robust Sarcasm Detection

Key Points

MM-MoE framework enhances sarcasm detection by integrating expert modules for improved generalization.
Experiments show that MM-MoE outperforms existing models significantly under conditions with spurious correlations.
The introduction of MMSD3.0 and MMSD4.0 benchmarks provides robust evaluation of models across diverse datasets.
The adaptive mechanism in MM-MoE effectively captures modality-level incongruity, leading to superior performance.

Abstract

Sarcasm is a complex emotional expression often marked by semantic contrast and incongruity between textual and visual modalities. In recent years, multi-modal sarcasm detection (MMSD) has emerged as a vital task in affective computing. However, existing models frequently rely on superficial spurious cues—such as emojis or hashtags—during training and inference, limiting their ability to capture deeper semantic inconsistencies and undermining generalization to real-world scenarios. To tackle these challenges, we propose Multi-Modal Mixture-of-Experts (MM-MoE), a novel framework that integrates diverse expert modules through a global dynamic gating mechanism for adaptive cross-modal interaction and selective semantic fusion. This architecture allows for the model to better capture modality-level incongruity. Furthermore, we introduce MMSD3.0 and MMSD4.0, two cross-dataset evaluation benchmarks derived from two open source benchmark datasets, MMSD and MMSD2.0, to assess model robustness under varying distributions of spurious cues. Extensive experiments demonstrate that MM-MoE achieves strong performance and generalization ability, consistently outperforming state-of-the-art baselines when encountering superficial spurious correlations.

Beyond Spurious Cues: Adaptive Multi-Modal Fusion via Mixture-of-Experts for Robust Sarcasm Detection

Key Points

Abstract

Cite This Study