Emotion recognition through electroencephalogram (EEG) signals is crucial for brain–computer interfaces (BCIs), yet existing methods often struggle with heterogeneous feature fusion and capturing long-range temporal dependencies. To address these challenges, we propose MAF-TransNet, a novel unified spatiotemporal framework. Specifically, parallel Fully Connected Neural Network (FCNN) modules first non-linearly align heterogeneous differential entropy (DE) and power spectral density (PSD) features. Subsequently, an Adaptive Channel-wise Feature Encoder (ACFE) recalibrates spatial–spectral responses to highlight emotion-relevant cortical activations. Finally, a Transformer encoder dynamically models the global temporal evolution of emotional states. Evaluated on the SEED-IV and DEAP datasets, MAF-TransNet achieves superior subject-dependent (SD) accuracies of 88.80% and 96.58%, respectively, alongside robust subject-independent (SI) performance. Furthermore, Granger causality analysis reveals distinct emotion-dependent prefrontal asymmetry, while t-SNE visualizations confirm the formation of a highly discriminative, linearly separable feature manifold. Ultimately, MAF-TransNet effectively unifies local spatial–spectral extraction with global temporal modeling, providing an accurate and robust approach, while offering preliminary insights into the spatiotemporal dynamics of emotion for future affective BCI applications.
Li et al. (Sun,) studied this question.