Cross-modal knowledge distillation from an EEG and PA-based multimodal teacher model improved pupil area unimodal model performance, achieving higher classification accuracy and F1 score in depression recognition compared to baselines.
Does a cross-modal knowledge distillation method improve depression recognition performance in a PA-based unimodal model compared to standard unimodal training?
Cross-modal knowledge distillation from an EEG/PA multimodal teacher to a PA unimodal student improves depression recognition while reducing data acquisition challenges, with mechanisms clarified by a novel Entropy-GradCAM explainability method.
Multimodal physiological signals provide a more reliable data source for depression detection. For instance, combining electroencephalography (EEG) and pupil area (PA) signals can enhance depression recognition. However, EEG acquisition is challenging, limiting the practical use of EEG-based multimodal approaches, while PA signals are more accessible. Additionally, while existing explainability methods for time series models can quantify the contribution of each feature, they often fail to provide a comprehensive understanding of how these contributions drive performance improvements, limiting insights into the underlying mechanisms. To address these limitations and enhance the generalizability of PA-based depression detection models, this paper proposes a cross-modal knowledge distillation method, using an EEG and PA-based multimodal teacher model and a PA-based unimodal student model. Through knowledge distillation, complex multimodal features are transferred to the PA-based model, enhancing its performance. We also introduce Entropy-GradCAM (E-GCAM), an explainability method combining information entropy and gradient-weighted class activation mapping (Grad-CAM), to clarify mechanisms behind the student model’s performance gains. Quantitative results show that knowledge-distilled time series models encode more useful information, consistent with observed student model improvements. Experimental results demonstrate that the proposed method achieves optimal performance on two datasets, effectively reducing reliance on multimodal data and increasing the practicality of depression recognition models.
Li et al. (Sun,) conducted a other in Patients with depression and healthy controls assessed by EEG and pupil area signals for depression recognition (n=140). PA-based unimodal student model with cross-modal knowledge distillation from EEG and PA-based multimodal teacher model vs. Baseline PA-based unimodal model and other unimodal baseline methods was evaluated on Depression recognition classification accuracy and F1 score on pupil area data. Cross-modal knowledge distillation from an EEG and PA-based multimodal teacher model improved pupil area unimodal model performance, achieving higher classification accuracy and F1 score in depression recognition compared to baselines.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: