October 1, 2024

DSViT: An Enhanced Transformer Model for Deepfake Detection

Key Points

Key points are not available for this paper at this time.

Abstract

The rapid development of artificial intelligence and deep learning models has enabled the creation of highly realistic fake images and videos, posing significant threats to information security and safety. Accurate detection of these forged contents is crucial to prevent the spread of misinformation and to protect the integrity of digital media. Although several advanced studies in this field, such as Vision Transformer (ViT) and Convolutional Vision Transformer (CViT), have been conducted, there remain limitations that need to be addressed. In this paper, we introduce a novel model, improved from CViT, designed to optimize the process of deepfake detection, named DSViT (Deepfake Detection with SC-based Convolutional Vision Transformer). This model judiciously integrates Convolutions and a SCConvolution block with the ViT architecture. We conducted experiments on the Deepfake Detection Challenge (DFDC) dataset and compared the results with the CViT model to demonstrate the effectiveness of the proposed model

اسأل الذكاء الاصطناعي

Bookmark