As a prominent form of AI-generated content, Deepfake has aroused substantial safety concerns, as they substantially enhance the stealth of fraudulent activities and boost their success rates in real-world scenarios. Most existing research on Deepfake focuses primarily on detection tasks, and fails to fully capture the subtle manipulation traces that are unique to different forgery algorithms left during the synthesis process. Additionally, it is also crucial to attribute the specific generation algorithms of manipulated videos, which can help determine the type of forgery and reduce the negative impact of widespread misinformation dissemination. To fill this gap, this paper proposes a spatiotemporal artifact-aware framework designed to simultaneously accomplish two core tasks: Deepfake video detection and forgery algorithm attribution. Specifically, to comprehensively model the spatiotemporal information of the tampered video, the powerful local feature learning capability of Convolutional Neural Networks (CNNs) and the long-range dependency capturing capability of the Transformer are combined to mine the traces left behind by the forgery process from both the local and global information of the input dynamic image sequence. To enhance the model’s ability to capture robust forgery features, frequency-domain filter is innovatively integrated into the convolutional feature, amplifying the subtle traces carried by synthesis algorithms. Furthermore, considering the multi-scale nature of forgery traces, we utilize both middle-layer and deep-layer outputs of the backbone network to separately expose temporal defects at different feature levels. The final prediction result for input face sequences is obtained by fusing the predictions from these two components. The proposed framework is trained under the joint supervision of cross-entropy loss, triplet loss, and hard sample mining loss. This multi-loss optimization strategy effectively adjusts intra-class compactness and inter-class separability, enabling the model to learn more discriminative features for both detection and attribution. Comprehensive experiments on the FaceForensics++ dataset demonstrate that the proposed method achieves 97.86±0.18% detection accuracy and 99.81±0.11% AUC, as well as 98.42±0.15% accuracy for forgery algorithm attribution, outperforming most state-of-the-art approaches on this dataset.
Liu et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: