What question did this study set out to answer?

The aim is to improve deepfake detection accuracy by merging global modeling with localized facial feature recognition.

April 3, 2026Open Access

Towards Generalizable Deepfake Detection via Facial Landmark-Guided Convolution and Local Structure Awareness

Key Points

The aim is to improve deepfake detection accuracy by merging global modeling with localized facial feature recognition.
Introduced Landmark-Guided Convolution (LGConv) to focus convolutional sampling on facial landmarks.
Developed a Facial Structure Awareness Block (FSAB) that works alongside a VMamba-based model.
Employed a multi-stage residual design with a CBAM attention mechanism to enhance sensitivity to facial artifacts.
Achieved AUC scores of 92.34% on CD1 and 96.01% on CD2 in cross-dataset evaluations.
Outperformed existing mainstream approaches in accuracy and sensitivity to facial forgery.

Abstract

As deepfakes become increasingly realistic, there is a growing need for robust and highly accurate facial forgery detection algorithms. Existing studies show that global feature modeling approaches (Transformer, VMamba) are effective in capturing long-range dependencies, yet they often lack sufficient sensitivity to localized facial tampering artifacts. Meanwhile, traditional convolutional methods excel at extracting local image features but struggle to incorporate prior knowledge about facial anatomy, resulting in limited representational capability. To address these limitations, this paper proposes LGMamba, a novel detection framework that integrates facial guidance focusing on key facial components and fine-grained detail regions commonly manipulated in deepfakes with global modeling. First, we introduce an innovative Landmark-Guided Convolution (LGConv), which adaptively adjusts convolutional sampling positions using facial landmark information. This allows the model to attend to forgery-prone facial regions, such as the eyes and mouth. Second, we design a parallel Facial Structure Awareness Block (FSAB) to operate alongside the VMamba-based visual State-Space Model. Equipped with a multi-stage residual design and a CBAM attention mechanism, FSAB enhances the model’s sensitivity to subtle facial artifacts, enabling joint exploitation of global semantic consistency and fine-grained forgery cues within a unified architecture. The proposed LGMamba achieves superior performance compared to existing mainstream approaches. In cross-dataset evaluations, it attains AUC scores of 92.34% on CD1 and 96.01% on CD2, outperforming all compared methods.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper