Background/Objectives: Electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) can objectively capture the spatiotemporal dynamics of brain activity during affective cognition, and their combination is promising for improving emotion recognition. However, multi-modal cross-subject emotion recognition remains challenging due to heterogeneous signal characteristics that hinder effective fusion and substantial inter-subject variability that degrades generalization to unseen subjects. Methods: To address these issues, this paper proposes DC-AGIN, a dual-contrastive learning attention graph isomorphism network for EEG-fNIRS emotion recognition. DC-AGIN employs an attention-weighted AGIN encoder to adaptively emphasize informative brain-region topology while suppressing redundant connectivity noise. For cross-modal fusion, a cross-modal contrastive learning module projects EEG and fNIRS representations into a shared latent semantic space, promoting semantic alignment and complementarity across modalities. Results: To further enhance cross-subject generalization, a supervised contrastive learning mechanism is introduced to explicitly mitigate subject-specific identity information and encourage subject-invariant affective representations. Experiments on a self-collected dataset are conducted under both subject-dependent five-fold cross-validation and subject-independent leave-one-subject-out (LOSO) protocols. The proposed method achieves 96.98% accuracy in four-class classification in the subject-dependent setting and 62.56% under LOSO. Compared with existing models, DC-AGIN achieves SOTA performance. Conclusions: These results demonstrate that the work on attention aggregation, cross-modal and cross-subject contrastive learning enables more robust EEG-fNIRS emotion recognition, thus supporting the effectiveness of DC-AGIN in generalizable emotion representation learning.
Yu et al. (Wed,) studied this question.