What question did this study set out to answer?

This review aims to evaluate and compare various deepfake detection techniques from traditional to advanced methods.

February 14, 2026Open Access

A Comprehensive Review of Deepfake Detection Techniques: From Traditional Machine Learning to Advanced Deep Learning Architectures

Key Points

This review aims to evaluate and compare various deepfake detection techniques from traditional to advanced methods.
Systematic review of peer-reviewed studies from 2018 to 2025
Analysis of detection methods utilizing deep learning, machine learning, and traditional image processing
Comparison based on accuracy, computing efficiency, and cross-dataset generalization using three benchmark datasets (FaceForensics++, DFDC, Celeb-DF)
Transformer architectures show better cross-dataset generalization with performance declines of 11.33% vs. CNNs' over 15%
Traditional machine learning methods like Random Forest achieve high accuracy (99.64% on DFDC) with lower computational needs
Performance deterioration by 10-15% across all methods indicates that current systems may learn dataset-specific artifacts instead of generalizable deepfake features

Abstract

Deepfake technology is causing unprecedented threats to the authenticity of digital media, and demand is high for reliable digital media detection systems. This systematic review focuses on an analysis of deepfake detection methods using deep learning approaches, machine learning methods, and the classical methods of image processing from 2018 to 2025 with a specific focus on the trade-off between accuracy, computing efficiency, and cross-dataset generalization. Through lavish analysis of a robust peer-reviewed studies using three benchmark data sets (FaceForensics++, DFDC, Celeb-DF) we expose important truths to bring some of the field’s prevailing assumptions into question. Our analysis produces three important results that radically change the understanding of detection abilities and limitations. Transformer-based architectures have significantly better cross-dataset generalization (11.33% performance decline) than CNN-based (more than 15% decline), at the expense of computation (3–5× more). To the contrary, there is no strong reason to assume the superiority of deep learning, and the performance of traditional machine learning methods (in our case, Random Forest) is quite comparable (accuracy of 99.64% on the DFDC) with dramatically lower computing needs, which opens up the prospects for their application in resource-constrained deployment scenarios. Most critically, we demonstrate deterioration of performance (10–15% on average) systematically across all methodological classes and we provide empirical support for the fact that current detection systems are, to a high degree, learning dataset specific compression artifacts, rather than deepfake characteristics that are generalizable. These results highlight the importance of moving from an accuracy-focused evaluation approach toward more comprehensive evaluation approaches that balance either generalization capability, computational feasibility, or practical deployment constraints, and therefore further direct future research efforts towards designing systems for detection that could be deployed in practical applications.

Bookmark

View Full Paper