What type of study is this?

This is a Quantitative Study study.

October 16, 2025Open Access

MM-FusionNet: Context-Aware Dynamic Fusion for Multi-modal Fake News Detection with Large Vision-Language Models

JHJunhao HeGuangzhou University of Chinese Medicine TLTianyu LiuChengdu University of Traditional Chinese Medicine JZJingyuan ZhaoDalian University of Technology

Key Points

MM-FusionNet achieved a state-of-the-art F1-score of 0.938, demonstrating superior accuracy in multi-modal fake news detection.
The context-aware dynamic fusion module effectively adapts the importance of textual and visual features, improving detection performance.
Evaluation on the Multi-modal Fake News Dataset, comprising 80,000 samples, underscores the robustness of the model against modality perturbations.
Results highlight that MM-FusionNet's performance approaches human-level accuracy, suggesting strong practical applicability.

Abstract

The proliferation of multi-modal fake news on social media poses a significant threat to public trust and social stability. Traditional detection methods, primarily text-based, often fall short due to the deceptive interplay between misleading text and images. While Large Vision-Language Models (LVLMs) offer promising avenues for multi-modal understanding, effectively fusing diverse modal information, especially when their importance is imbalanced or contradictory, remains a critical challenge. This paper introduces MM-FusionNet, an innovative framework leveraging LVLMs for robust multi-modal fake news detection. Our core contribution is the Context-Aware Dynamic Fusion Module (CADFM), which employs bi-directional cross-modal attention and a novel dynamic modal gating network. This mechanism adaptively learns and assigns importance weights to textual and visual features based on their contextual relevance, enabling intelligent prioritization of information. Evaluated on the large-scale Multi-modal Fake News Dataset (LMFND) comprising 80,000 samples, MM-FusionNet achieves a state-of-the-art F1-score of 0.938, surpassing existing multi-modal baselines by approximately 0.5% and significantly outperforming single-modal approaches. Further analysis demonstrates the model's dynamic weighting capabilities, its robustness to modality perturbations, and performance remarkably close to human-level, underscoring its practical efficacy and interpretability for real-world fake news detection.

Perguntar à IA

Bookmark

View Full Paper