Fake news has caused significant harm and disruption across various sectors of society. With the rapid advancement of the Internet and social media platforms, both academic and industrial communities have shown growing interest in multimodal fake news detection. In this work, we propose MERF (Multimodal Evidence Retrieval and Forensics with LVLM), a unified framework for multimodal fake news detection that leverages the reasoning capabilities of Large Vision-Language Models (LVLMs). While LVLMs outperform traditional Large Language Models (LLMs) in processing multimodal content, our study reveals that their reasoning abilities remain limited in the absence of sufficient supporting evidence. MERF addresses this challenge by integrating web-based content retrieval, reverse image search, and image manipulation detection into a coherent pipeline, enabling the model to generate informed and explainable veracity judgments. Specifically, our approach performs cross-modal consistency checking, retrieves corroborative information for both textual and visual content, and applies forensic analysis to detect potential visual forgeries. The aggregated evidence is then fed into the LVLM, facilitating comprehensive reasoning and evidence-based decision-making. Experimental results on two public benchmark datasets—Weibo and Twitter—demonstrate that MERF consistently outperforms state-of-the-art baselines across all major evaluation metrics, achieving substantial improvements in accuracy, robustness, and interpretability.
Dong et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: