What question did this study set out to answer?

The study aims to enhance fake news detection by integrating multimodal evidence retrieval and analysis using LVLMs.

March 27, 2026Open Access

Multimodal Fake News Detection via Evidence Retrieval and Visual Forensics with Large Vision-Language Models

Key Points

The study aims to enhance fake news detection by integrating multimodal evidence retrieval and analysis using LVLMs.
Developed the MERF framework combining evidence retrieval, reverse image search, and visual forensics.
Conducted cross-modal consistency checking to validate information from text and images.
Performed forensic analysis to identify visual manipulations.
Utilized public benchmark datasets for comprehensive evaluation.
MERF outperformed traditional baselines in fake news detection across all evaluation metrics.
Achieved significant improvements in accuracy and robustness.
Enhanced interpretability of veracity judgments through explainable AI.

Abstract

Fake news has caused significant harm and disruption across various sectors of society. With the rapid advancement of the Internet and social media platforms, both academic and industrial communities have shown growing interest in multimodal fake news detection. In this work, we propose MERF (Multimodal Evidence Retrieval and Forensics with LVLM), a unified framework for multimodal fake news detection that leverages the reasoning capabilities of Large Vision-Language Models (LVLMs). While LVLMs outperform traditional Large Language Models (LLMs) in processing multimodal content, our study reveals that their reasoning abilities remain limited in the absence of sufficient supporting evidence. MERF addresses this challenge by integrating web-based content retrieval, reverse image search, and image manipulation detection into a coherent pipeline, enabling the model to generate informed and explainable veracity judgments. Specifically, our approach performs cross-modal consistency checking, retrieves corroborative information for both textual and visual content, and applies forensic analysis to detect potential visual forgeries. The aggregated evidence is then fed into the LVLM, facilitating comprehensive reasoning and evidence-based decision-making. Experimental results on two public benchmark datasets—Weibo and Twitter—demonstrate that MERF consistently outperforms state-of-the-art baselines across all major evaluation metrics, achieving substantial improvements in accuracy, robustness, and interpretability.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper

Cite This Study

Dong et al. (Wed,) studied this question.

synapsesocial.com/papers/69c61f8515a0a509bde18058 https://doi.org/https://doi.org/10.3390/info17040317

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Perguntar à IA

Bookmark

View Full Paper