What question did this study set out to answer?

This research aims to develop an effective detection method for deepfake images using advanced AI techniques.

March 3, 2026Open Access

Artificial Intelligence-Driven Deepfake Detection: Hybrid Self-Supervised Learning and Swin Transformer for Explainable Fake Image Classification

Key Points

This research aims to develop an effective detection method for deepfake images using advanced AI techniques.
Utilized a convolutional autoencoder for feature extraction and reconstruction of images.
Trained the model using the Adam optimizer with a specified batch size and epochs.
Incorporated data augmentation to improve feature learning and model robustness.
Combined features from a CAE and a pre-trained Swin Transformer model for enhanced classification.
Assessed the method's performance using five challenging deepfake datasets.
Achieved high accuracy rates of 0.999, 0.996, 0.984, 0.988, and 0.917 on respective datasets.
Demonstrated effective feature extraction capabilities of the hybrid model.
Indicated potential for reducing misinformation and enhancing trust in media.

Abstract

Deepfake technology generates highly realistic fake images, videos, and audio based on deep learning, specifically using Generative Adversarial Networks. It can manipulate existing media more accurately, such as creating fake faces, altering voices, and generating content that appears genuine and authentic. That is why the deepfake leads to fake news, fraud, harassment, loss of trust in media, and also supports cybercrime. Therefore, a method is proposed to handle such challenges. It incorporates a convolutional autoencoder (CAE) self-supervised model in which the encoder compresses input images and the decoder reconstructs them, ensuring robust feature extraction. The CAE is trained from scratch using the Adam optimizer, with a batch size of 16 and 50 epochs, alongside data augmentation, to enhance feature learning. After training, the encoder is frozen, and custom layers, including Global Average Pooling, Batch Normalization, and Dense layers with L1/L2 regularization, are appended to refine the extracted features. Additionally, a pre-trained Swin-Large Transformer model (patch-4, window-7, 224-in22k) is used, with appended layers such as Batch Normalization and Dense blocks for enhanced semantic feature encoding. Feature vectors from both the CAE encoder and the Swin Transformer are concatenated and passed to the classifier for final prediction. The method is assessed on five Top challenging deepfake datasets, such as OpenForensics, Flickr-140, Biometric, Liveness, and Deep Fake Detection (DFD), with accuracy of 0.999, 0.996, 0.984, 0.988, and 0.917, respectively.

Artificial Intelligence-Driven Deepfake Detection: Hybrid Self-Supervised Learning and Swin Transformer for Explainable Fake Image Classification

Key Points

Abstract

Cite This Study