What question did this study set out to answer?

This research aims to develop an adaptive multimodal deep learning framework for real-time monitoring of driver cognitive states, focusing on fatigue and cognitive overload.

May 8, 2026Open Access

Adaptive multimodal learning for driver cognitive state monitoring using transformer-based fusion with personalized meta-learning and federated optimization

Key Points

This research aims to develop an adaptive multimodal deep learning framework for real-time monitoring of driver cognitive states, focusing on fatigue and cognitive overload.
Utilized the CL-Drive dataset, which includes EEG, ECG, EDA, and gaze tracking data from 21 participants.
Implemented a hybrid CNN–BiLSTM architecture for feature extraction and a transformer-based network for cross-modal attention.
Adopted personalized meta-learning combined with federated optimization for decentralized model updates and privacy preservation.
Achieved 80.5 ± 1.8% accuracy on cognitive load classification without personalization, increasing to 91.8 ± 1.2% with K = 20 calibration samples.
Under leave-one-subject-out protocol, obtained 77.8 ± 2.6% accuracy without personalization and 84.0 ± 1.8% with K = 20, showing a significant personalization effect.
Demonstrated robustness against sensor noise with 81.5 ± 2.3% accuracy using only K = 5 calibration samples.

Abstract

Road accidents caused by driver fatigue and cognitive overload remain a significant public safety concern. According to recent traffic safety data, drowsy driving contributes to thousands of fatal accidents each year, emphasizing the urgent need for intelligent driver monitoring systems. To address this, we propose an adaptive multimodal deep learning framework (AML) for real-time cognitive workload assessment and fatigue detection, leveraging the CL-Drive dataset: a multimodal repository of EEG (cognitive load), ECG (cardiac activity), EDA (electrodermal arousal), and gaze tracking (visual attention) captured from 21 participants during simulated driving across nine scenarios of escalating complexity. Our framework integrates a hybrid CNN–BiLSTM architecture to extract spatiotemporal features from raw physiological signals and gaze sequences, capturing localized spatial patterns and long-term temporal dynamics. These features are fused using a transformer-based network with cross-modal attention, which models interactions between modalities (e. g. , correlating gaze fixation losses with EEG theta-band surges during distraction) and yields a 3. 6 percentage-point absolute accuracy improvement over the strongest conventional fusion baseline under identical evaluation. To address individual variability and privacy, we combine personalized meta-learning—adapting to new drivers with as few as five windowed samples (10 s of synchronized multimodal data) via episodic fine-tuning—with federated optimization, enabling decentralized model updates and reducing per-client data transfer by 38% through adaptive gradient compression. Experiments on CL-Drive demonstrate state-of-the-art performance under strictly cross-subject evaluation. Under subject-independent 5-fold cross-validation, AML achieves 80. 5 1. 8\% accuracy on binary cognitive load classification without personalization, rising to 91. 8 1. 2\% with K = 20 calibration samples (40 s). Under the more rigorous leave-one-subject-out (LOSO) protocol, AML reaches 77. 8 2. 6\% without personalization and 84. 0 1. 8\% with K = 20 personalization, an improvement of 1. 6 percentage points over the strongest published LOSO baseline on this dataset with a further 6. 2–11. 3 percentage points gained from personalization alone across the LOSO and 5-fold protocols. The framework exhibits robustness to real-world sensor noise (e. g. , EEG/EDA motion artifacts) and achieves 81. 5 2. 3\% LOSO accuracy with only K = 5 samples (10 s of calibration per new driver), critical for scalable in-vehicle deployment. By enabling privacy-aware, real-time monitoring of driver states, this work advances intelligent vehicle safety systems and provides a blueprint for adaptive multimodal learning in human-centric AI applications.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Abinaya et al. (Wed,) studied this question.

synapsesocial.com/papers/69fd7ee0bfa21ec5bbf0727a — DOI: https://doi.org/10.1038/s41598-026-51635-3

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Authors

G. Abinaya

Saveetha University

K. Dinakaran

Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology

Journals

Scientific Reports

Actions

Institutions

Saveetha University

National Institute of Technology Meghalaya

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Adaptive multimodal learning for driver cognitive state monitoring using transformer-based fusion with personalized meta-learning and federated optimization

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider