What question did this study set out to answer?

This research aims to understand how different deepfake types and brief viewing durations impact human detection abilities.

March 28, 2026Open Access

We Are Good at Detecting Stable-Diffusion-Generated Deepfakes but Not StyleGAN-Generated or Face-Swapped Deepfakes

Key Points

This research aims to understand how different deepfake types and brief viewing durations impact human detection abilities.
Conducted a within-subject design with 38 participants
Tested 6 viewing durations and 4 image types
Tracked eye movements while participants judged images
Applied Generalized Linear Mixed Models for analysis
Stable-Diffusion-generated and real images were detected accurately (83-88%) even at 17ms
StyleGAN-generated images had low detection accuracy (4-11%)
Face-swapped images had improved detection with longer view durations (12-29%)
Eye tracking revealed fixations focused on the nose for face-swapped images
Overall, human detection performance matched the second-best detection model across all types.

Abstract

Some face images on social media are generated by algorithms. Recent research found StyleGAN-generated faces are indistinguishable from real faces for humans under unlimited viewing duration. However, it is unclear whether different algorithms and brief exposure (often in social media) affect human detection. Using a 6 (viewing duration: 17–1000 ms) × 4 (image type: real, Stable-Diffusion, StyleGAN, face-swapped) within-subject design, we investigated how viewing duration and deepfake type jointly shape human detection and attention. Thirty-eight participants viewed faces to judge whether they are AI-generated, with eye movement tracked. Generalized Linear Mixed Models (N = 38) showed image type, viewing duration, and their interaction significantly influenced detection accuracy (all p ’s < .001). Stable-Diffusion-generated and real images were accurately identified even at the shortest duration (accuracy: 83-88% all viewing durations), while StyleGAN-generated and face-swapped images were difficult to detect (accuracy: 4-11% and 12-29%, respectively). Interestingly, detection accuracy improved with longer viewing duration for face-swapped images but declined for StyleGAN-generated images. Eye movement analysis (N = 37) showed face-swapped images elicited fewer-but-prolonged fixations with nose-focused patterns, suggesting more holistic processing strategy used for swapped faces. Compared with five off-the-shelf deepfake detection models, human performance was comparable to that of the second-best model across all deepfake types. As the first study to systematically test three major deepfake types from ultra-brief to moderate viewing durations, our study fills a gap in prior research that is limited to single deepfake types or long durations, improving understanding of face perception mechanisms in the age of AI. • Stable-Diffusion-generated faces could be detected as fake with 17ms-viewing. • StyleGAN-generated and face-swapped images were easily misidentified as real. • Viewing duration significantly affected human detection of deepfake. • Viewing duration and deepfake type had an interaction effect on detection. • People tend to fixate on the nose region when detecting deepfakes.

We Are Good at Detecting Stable-Diffusion-Generated Deepfakes but Not StyleGAN-Generated or Face-Swapped Deepfakes

Key Points

Abstract

Cite This Study