Key points are not available for this paper at this time.
We propose a multi-modal detection for deepfake videos, called the Incompatibility Between Multiple Modes (IBMM) detection. The detection algorithm can detect whether the video is real or fake, and may be embedded in the monitoring equipment in the future. The model adopts EfficientNet and simple 3D-CNN, and it identifies deepfake videos through three modes. In the facial motion mode and lip motion mode, we use the EfficientNet for feature learning. This network uses a series of fixed scaling coefficients to scale the dimensions of the network uniformly and achieves good results in learning image features. In the audio mode, we adopt 3D-CNN network to train the hot coding diagram of audio data. Besides, for a single mode, we use the cross-entropy loss to calculate the irrationality of the mode. For different modes, the contrastive loss is used to calculate the incongruity between the modes, such as incompatibility between lips and voice. Experimental results show that, compared with other existing fake detection methods, the method presented in this paper has higher accuracy (95.87%) on DFDC datasets. And compared with the existing methods, the accuracy increases by 5.21%.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yuxin Zhang
Jinyu Zhan
Wei Jiang
University of Electronic Science and Technology of China
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang et al. (Sun,) studied this question.
www.synapsesocial.com/papers/6a151425a05db7ab4b62e140 — DOI: https://doi.org/10.1109/icites53477.2021.9637096