Abstract Sophisticated Deepfake technologies increasingly challenge the authenticity of digital media, underscoring the need for advanced multimodal detection methods. This review synthesizes cutting-edge deep learning approaches for identifying audio-visual forgeries, emphasizing fusion strategies that seamlessly integrate visual and auditory signals to combat complex manipulations. By evaluating key public datasets and benchmarks, we highlight their efficacy in critical applications, including social media content moderation, judicial forensics, and fraud prevention. Despite notable advances, limitations in cross-domain generalization and computational efficiency hinder practical deployment. Future efforts should focus on developing lightweight, scalable architectures and standardized evaluation protocols to bolster detection robustness across diverse real-world scenarios, safeguarding the integrity of digital content.
Building similarity graph...
Analyzing shared references across papers
Loading...
Dengtai Tan
Xiaojiang Peng
Chengyu Niu
Guilin University of Electronic Technology
Gansu Institute of Political Science and Law
Building similarity graph...
Analyzing shared references across papers
Loading...
Tan et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68af5f13ad7bf08b1eae1e4e — DOI: https://doi.org/10.1007/s42452-025-07629-3