January 1, 2024Open Access

Improving Video Vision Transformer for Deepfake Video Detection Using Facial Landmark, Depthwise Separable Convolution and Self Attention

Key Points

Key points are not available for this paper at this time.

Abstract

In this paper, we present our result of research in video deepfake detection. We built a deepfake detection system to detect whether a video is a deepfake or real. The deepfake detection algorithm still struggle in providing a sufficient accuracy values, especially in challenging deepfake dataset. Our deepfake detection system utilized spatiotemporal feature that extracted using Video Vision Transformer (ViViT). The main contribution of our research is providing a deepfake detection system that based on ViViT architecture and using landmark area images for the input of the system. Our system extracted the feature from a number of spatial features. The spatial feature was extracted using Depthwise Separable Convolution (DSC) block combined with Convolution Block Attention Module (CBAM) from tubelet. The tubelet was a representation of facial landmark area that was extracted from the input video. In our system, we used 25 facial landmark area for an input video. In our experiment we used Celeb-DF version 2 dataset because it is considered to be a challenging deepfake dataset. We conducted augmentation to the dataset, so we obtained 8335 videos for training set, 390 videos for validation set, and 1123 videos for testing set. We trained our deepfake detection system using Adam optimizer, with learning rate of 10-4 and 100 epoch. From the experiment, we obtained the accuracy score of 87.18% and F1 score of 92.52%. We also conducted the ablation study to display the effect of each part of our model to the overall system performance. From this research, we obtained that by using landmark area images, our ViViT based deepfake detection system had a good performance in detecting deepfake videos.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Kurniawan Nur Ramadhani

Telkom University

Rinaldi Munir

Bandung Institute of Technology

Nugraha Priya Utama

Bandung Institute of Technology

Journals

IEEE Access

Actions

Institutions

Bandung Institute of Technology

Telkom University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Improving Video Vision Transformer for Deepfake Video Detection Using Facial Landmark, Depthwise Separable Convolution and Self Attention

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study