What question did this study set out to answer?

The research aims to improve driver drowsiness detection by incorporating gender-specific behavioral differences.

April 1, 2026Open Access

Gender-Aware Driver Drowsiness Detection Using Multi-Stream Shifted-Window-Based Hierarchical Vision Transformers

Key Points

The research aims to improve driver drowsiness detection by incorporating gender-specific behavioral differences.
Developed a multi-stream transformer architecture for drowsiness detection.
Integrated gender embedding to modulate feature learning.
Evaluated model variants using the NTHU-DDD dataset across three facial regions.
Conducted an ablation study to find the best model configuration.
Achieved 95.47% accuracy on the evaluation set.
The three-stream model outperformed other variants in detection performance.
Incorporating gender awareness led to noticeable improvements in detection accuracy.

Abstract

Given its substantial contribution to traffic accidents, one of the main goals of intelligent driver-assistance systems has become the detection and mitigation of driver fatigue to enhance driving safety and comfort. Among various approaches, vision-based facial analysis using deep learning has emerged as an effective and non-intrusive method for identifying driver drowsiness, as a key manifestation of fatigue. However, current drowsiness detection models do not account for demographic factors like gender, even though recent research has shown gender behavioral differences such as eye closure duration, blink frequency, yawning patterns, and facial muscle relaxation. In this paper, we present a fine-grained multi-stream transformer architecture that incorporates gender-awareness and shifted-windows attention for spatial feature fusion. Integrating gender embedding, by modulating the region-based features, allows the model to effectively learn gender-conditioned drowsiness features to minimize bias and diluted representations. Using the NTHU-DDD dataset, we evaluated two-stream and three-stream variants for gender-aware and gender-agnostic across three facial region contexts: the face region with a 20% margin, bare face region, and key facial regions (face, eyes, and mouth). A comprehensive ablation study was conducted to identify the most effective model setup. The results demonstrate that incorporating gender embedding improves detection performance, achieving an accuracy of 95.47% on the evaluation set. Moreover, using the proposed three-stream model (SWT-DD-3S) produced better results.

Gender-Aware Driver Drowsiness Detection Using Multi-Stream Shifted-Window-Based Hierarchical Vision Transformers

Key Points

Abstract

Cite This Study