With the continuous evolution of deep learning, forgery techniques have undergone constant innovation, providing convenience to individuals and resulting in significant negative consequences. Notably, these forged videos have become remarkably realistic, nearly indistinguishable to the human eye, posing a formidable challenge in forgery detection. However, many current Deepfake detection models focus on improving evaluation metrics and model architecture design, often lacking the necessary generality and practicality. We propose a Deepfake detection method based on a hybrid network in response to these challenges. Our approach utilizes an improved EfficientNetV2S as the backbone, replacing the original Fused-Conv module with a Tok-MLP module and integrating an attention mechanism at the end of the backbone. Subsequently, the backbone's output is fed into a Vision Transformer (VIT) for classification. Extensive work in data preprocessing includes training our model on three datasets: DFDC, Celeb-DF v2, and FaceForensics++. The achieved results are exceptionally competitive. Additionally, visual analysis of DFDC dataset videos validates the practicality of our approach, yielding outstanding results. In conclusion, the relentless evolution of Deepfake technology poses challenges and opportunities. Our novel Deepfake detection method, grounded in a hybrid network, enhances the capabilities of existing models, ensuring practicality and effectiveness in real-world scenarios.
Deng et al. (Fri,) studied this question.