What does this research mean for the field?

An Interactive Two-Stream Network (ITSNet) utilizing cross-modality inconsistency representations and temporal embedding modules achieves superior in-dataset and cross-dataset generalization in deepfake detection compared to state-of-the-art methods. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

April 24, 2023

Interactive Two-Stream Network Across Modalities for Deepfake Detection

Key Points

Key points are not available for this paper at this time.

Abstract

As face forgery techniques have become more mature, the proliferation of deepfakes may threaten the security of human society. Although existing deepfake detection methods achieve good performance for in-dataset evaluation, it remains to be improved in the generalization ability, where the representation of the imperceptible artifacts plays a significant role. In this paper, we propose an Interactive Two-Stream Network (ITSNet) to explore the discriminant inconsistency representation from the perspective of cross-modality. In particular, the patch-wise Decomposable Discrete Cosine Transform (DDCT) is adopted to extract fine-grained high-frequency clues, and information from different modalities communicates with each other via a designed interaction module. To perceive the temporal inconsistency, we first develop a Short-term Embedding Module (SEM) to refine subtle local inconsistency representation between adjacent frames, and then a Long-term Embedding Module (LEM) is designed to further refine the erratic temporal inconsistency representation from the long-range perspective. Extensive experimental results conducted on three public datasets show that ITSNet outperforms the state-of-the-art methods both in terms of in-dataset and cross-dataset evaluations.

Mark Helpful

Bookmark

Relay