Key points are not available for this paper at this time.
The automatic detection of multimodal fake news can be used to effectively identify potential risks in cyberspace. Most of the existing multimodal fake news detection methods focus on fully exploiting textual and visual features in news content, thus neglecting the full utilization of news social context features that play an important role in improving fake news detection. To this end, we propose a new fake news detection method based on CLIP contrastive learning and multimodal semantic alignment (SARD). SARD leverages cutting-edge multimodal learning techniques, such as CLIP, and robust cross-modal contrastive learning methods to integrate features of news-oriented heterogeneous information networks (N-HIN) with multi-level textual and visual features into a unified framework for the first time. This framework not only achieves cross-modal alignment between deep textual and visual features but also considers cross-modal associations and semantic alignments across different modalities. Furthermore, SARD enhances fake news detection by aligning semantic features between news content and N-HIN features, an aspect largely overlooked by existing methods. We test and evaluate SARD on three real-world datasets. Experimental results demonstrate that SARD significantly outperforms the twelve state-of-the-art competitors in fake news detection, with an average improvement of 2.89% in Mac.F1 score and 2.13% in accuracy compared to the leading baseline models across three datasets.
Yan et al. (Wed,) studied this question.