Los puntos clave no están disponibles para este artículo en este momento.
With the development of deepfake methods, a large number of deepfake images and videos have been widely disseminated on the internet, raising public concerns about the authenticity of information. Therefore, deepfake detection has recently become a hot topic in the field of computer vision, and many methods have been proposed. Currently, frequency-based detection methods have achieved commendable results, but there are still two issues: a) These methods use fixed filters to focus on fixed frequency bands and areas, making them easily distracted by irrelevant information and lacking flexibility for different forgery methods. b) The methods that fuse frequency domain information with RGB information using CNNs do not consider global relationships, so they are insufficient to fully utilize both types of information. To address these issues, we introduce a Frequency-Enhanced Transformer Network (FETNet). Specifically, we propose a Frequency Feature Enhancement Module (FFEM), which is a learnable module capable of flexibly enhancing important frequency bands and regions in the original frequency features. Additionally, we present a Feature Fusion Transformer (FFT) that considers global information to fuse features from the RGB and frequency domains, achieving a more comprehensive feature representation. Through extensive experiments on the FF++ dataset, the effectiveness and superiority of our approach have been demonstrated.
Zheng et al. (Fri,) studied this question.