Deepfake technology generates highly realistic fake images, videos, and audio based on deep learning, specifically using Generative Adversarial Networks. It can manipulate existing media more accurately, such as creating fake faces, altering voices, and generating content that appears genuine and authentic. That is why the deepfake leads to fake news, fraud, harassment, loss of trust in media, and also supports cybercrime. Therefore, a method is proposed to handle such challenges. It incorporates a convolutional autoencoder (CAE) self-supervised model in which the encoder compresses input images and the decoder reconstructs them, ensuring robust feature extraction. The CAE is trained from scratch using the Adam optimizer, with a batch size of 16 and 50 epochs, alongside data augmentation, to enhance feature learning. After training, the encoder is frozen, and custom layers, including Global Average Pooling, Batch Normalization, and Dense layers with L1/L2 regularization, are appended to refine the extracted features. Additionally, a pre-trained Swin-Large Transformer model (patch-4, window-7, 224-in22k) is used, with appended layers such as Batch Normalization and Dense blocks for enhanced semantic feature encoding. Feature vectors from both the CAE encoder and the Swin Transformer are concatenated and passed to the classifier for final prediction. The method is assessed on five Top challenging deepfake datasets, such as OpenForensics, Flickr-140, Biometric, Liveness, and Deep Fake Detection (DFD), with accuracy of 0.999, 0.996, 0.984, 0.988, and 0.917, respectively.
Javaria Amin (Sun,) studied this question.