Driver drowsiness is a major cause of road accidents worldwide, leading to thousands of fatalities and billions in economic losses each year. This study presents a real-time drowsiness detection framework based on a hybrid Vision Transformer Convolutional Neural Network (ViT-CNN) architecture enhanced with multi-task learning and temporal attention. Unlike traditional sensor-based or reactive vehicle behavior methods, the proposed vision-based approach provides a non-intrusive, scalable solution capable of detecting early fatigue indicators such as eye closure, yawning, and head pose. The model leverages self-supervised pretraining on over 1.2 M unlabeled driving videos and is optimized for embedded deployment using INT8 quantization and TensorRT, achieving 99.27% accuracy, F1 = 0.98, and AUC = 0.998 while sustaining 42 FPS at 42 ms latency on the NVIDIA Jetson AGX Xavier. Explainability tools (Grad-CAM + + and Bayesian uncertainty estimation) ensure transparency in safety-critical contexts. Evaluation across six datasets demonstrates strong generalization, and the framework is adaptable to other fatigue-sensitive domains such as aviation and industrial safety.
OISE et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: