What question did this study set out to answer?

This research examines the influence of reasoning-based supervision on face anti-spoofing effectiveness.

March 15, 2026Open Access

Analyzing the effect of reasoning-based supervision on face anti-spoofing

Key Points

This research examines the influence of reasoning-based supervision on face anti-spoofing effectiveness.
Developed an explanation-augmented benchmark using four FAS datasets
Utilized a vision-language model to analyze the impact of natural language explanations
Adopted dual-objective training combining spoof classification and explanation generation loss
Identified reasoning-style captions enhance detection performance in various settings
Revealed potential negative effects from inductive biases when cues misalign with unseen spoofing types
Provided explanation annotations and metadata for reproducibility via a Hugging Face repository

Abstract

Face anti-spoofing (FAS) has become a crucial component in securing face recognition systems against presentation attacks, such as printed photos, replay videos, and 3D masks. While recent advances have improved generalization to unseen spoofing attempts, many existing methods remain black-box models that provide binary decisions without interpretable reasoning. In this paper, we investigate explainable face anti-spoofing from a supervision-centric perspective, using a vision-language model (VLM) to analyze how natural language explanations influence model behavior. To enable this study under controlled conditions, we construct an explanation-augmented benchmark by enriching four standard FAS datasets—MSU-MFSD, CASIA-FASD, Replay-Attack, and OULU-NPU—with both vanilla and reasoning-structured captions generated via the GPT-4o API. We further adopt a dual-objective training strategy that combines spoof classification loss with explanation generation loss, allowing us to examine the effect of explanation-based supervision while keeping the backbone architecture fixed. Through extensive cross-dataset evaluations, we show that reasoning-style captions can enhance detection performance and domain generalization in many settings, while also introducing inductive biases that may degrade performance when emphasized cues are misaligned with unseen attack types. These findings suggest that explanations in FAS should be viewed not only as interpretable outputs, but also as controllable training signals that shape generalization behavior. To support reproducibility, we publicly release the explanation annotations and associated metadata—excluding all face images—via a Hugging Face repository at https: //huggingface. co/datasets/DescriptiveFAS/MCIOₚublic.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Min et al. (Fri,) studied this question.

synapsesocial.com/papers/69b5ff3b83145bc643d1b633 https://doi.org/https://doi.org/10.1038/s41598-026-43800-5

Bookmark

View Full Paper