Key points are not available for this paper at this time.
With the fast and extensive development of computer vision techniques, multimodal analyses are utilized more frequently for online fake news detection. To better understand the image–text relationship and its role in fake news detection, in this article, we proposed and evaluated four image–text similarities, namely, textual similarity, semantic similarity, contextual similarity, and post-training similarity. The textual and semantic similarities indicate the original image–text similarities in terms of the text information and image caption information. The contextual similarity reflects the image–text similarity in the format of meaningful named entities. The post-training similarity demonstrates how image–text similarity involves before and after a fake news detection model is trained. By evaluating the proposed similarity measurements on three real-world datasets, we find that fake news image–text similarity is higher than real news image–text similarity in most of the cases. Furthermore, the comparison of models’ performance further validates the significance of visual information in online fake news detection. These findings may be considered as the fundamental logic to explain the original purpose of fake news creation and can be used as influential features for improving models’ performance in the future.
Zhang et al. (Wed,) studied this question.