Face anti-spoofing, commonly known as facial liveness detection, is a critical security component of modern biometric authentication systems. It determines whether a presented face belongs to a live person or is a fraudulent artefact such as a printed photograph, video replay, or three-dimensional mask. This paper proposes a hybrid deep learning architecture that fuses a Convolutional Neural Network (CNN) with a Vision Transformer (ViT) for binary liveness classification on the CASIA Face Anti-Spoofing Database. The CNN component captures fine-grained local texture cues indicative of spoofing artefacts, while the ViT component models global spatial relationships across image patches through multi-head self-attention. To provide performance context, a standalone ViT and a classical Local Binary Pattern (LBP) and Support Vector Machine (SVM) pipeline are also evaluated. The proposed hybrid model achieves the highest classification accuracy, validating that joint local-global feature extraction yields more discriminative representations for face anti-spoofing.
Nayyar et al. (Fri,) studied this question.