Driver fatigue remains a key safety issue in autonomous vehicles. Traditional unimodal methods face limitations: vision-based approaches are environment-sensitive, while EEG signals suffer from noise and poor cross-subject generalization. This review analyzes multimodal fusion strategies to address these challenges. Early fusion integrates low-level features (e.g., visual cues and physiological signals) for joint representation learning, enhancing real-time performance and cross-modal correlations. Late fusion dynamically adjusts modality weights using high-level predictions, improving robustness in changing environments. Experiments show early fusion achieves 92% accuracy in stable conditions, while late fusion reduces false alarms by 18% in dynamic scenarios. Deep learning architectures, particularly Transformer-based attention mechanisms and lightweight edge-compatible models, balance computational efficiency and accuracy. Future directions emphasize privacy-preserving federated learning and automotive-grade hardware-software co-design. By bridging laboratory prototypes with real-world needs, this work lays groundwork for holistic "fatigue intervention" systems in autonomous driving.
Xiang Huang (Wed,) studied this question.