Given its substantial contribution to traffic accidents, one of the main goals of intelligent driver-assistance systems has become the detection and mitigation of driver fatigue to enhance driving safety and comfort. Among various approaches, vision-based facial analysis using deep learning has emerged as an effective and non-intrusive method for identifying driver drowsiness, as a key manifestation of fatigue. However, current drowsiness detection models do not account for demographic factors like gender, even though recent research has shown gender behavioral differences such as eye closure duration, blink frequency, yawning patterns, and facial muscle relaxation. In this paper, we present a fine-grained multi-stream transformer architecture that incorporates gender-awareness and shifted-windows attention for spatial feature fusion. Integrating gender embedding, by modulating the region-based features, allows the model to effectively learn gender-conditioned drowsiness features to minimize bias and diluted representations. Using the NTHU-DDD dataset, we evaluated two-stream and three-stream variants for gender-aware and gender-agnostic across three facial region contexts: the face region with a 20% margin, bare face region, and key facial regions (face, eyes, and mouth). A comprehensive ablation study was conducted to identify the most effective model setup. The results demonstrate that incorporating gender embedding improves detection performance, achieving an accuracy of 95.47% on the evaluation set. Moreover, using the proposed three-stream model (SWT-DD-3S) produced better results.
Nurnoby et al. (Mon,) studied this question.