The growing rate of multimodal misinformation, where claims are supported by both text and images, poses significant challenges to fact-checking systems that rely primarily on textual evidence. In this work, we have proposed a unified framework for fine-grained multimodal fact verification called "MultiCheck", designed to reason over structured textual and visual signals. Our architecture combines dedicated encoders for text and images with a fusion module that captures cross-modal relationships using element-wise interactions. A classification head then predicts the veracity of a claim, supported by a contrastive learning objective that encourages semantic alignment between claim-evidence pairs in a shared latent space. We evaluate our approach on the Factify 2 dataset, achieving a weighted F1 score of 0.84, substantially outperforming the baseline. These results highlight the effectiveness of explicit multimodal reasoning and demonstrate the potential of our approach for scalable and interpretable fact-checking in complex, real-world scenarios.
Building similarity graph...
Analyzing shared references across papers
Loading...
Aditya Kishore
Gaurav Kumar
Jasabanta Patro
Building similarity graph...
Analyzing shared references across papers
Loading...
Kishore et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68f12bfb2107091eab27a379 — DOI: https://doi.org/10.48550/arxiv.2508.05097
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: