Key points are not available for this paper at this time.
New developments in computer vision and deep learning have hastened the creation of more realistic synthetic face images, like Deepfake, that can change a person’s look and emotions. Some famous people have fallen victim to adult content manipulation applications like deepfake and face-swap, which have disrupted society despite the programs’ innocent beginnings. In order to safeguard individuals’ privacy and keep society stable, it is essential to invest in the development of algorithms that can identify instances of inaccurate portrayals of faces in media. A new approach for Deepfake video recognition is presented in this paper. It utilizes an ensemble learning strategy that includes a multimodal approach (CNN, MobileNetV2, and LSTM). Considering that video compression increases frame redundancy, our proposed method employs a frame-level stream to mitigate network noise and prevent overfitting associated with compression issues. The main goal of the feature extraction in the multimodal technique is to obtain face areas from video frames. The proposed model is proved to be efficient by the experimental findings, which demonstrate an average prediction accuracy of 93.80% for the FaceForensics++ dataset and 98.13% for the Celeb-DF dataset. We prove that our method works by proving that our strategy beats standalone models on a variety of diverse datasets.
Ali et al. (Sat,) studied this question.