To efficiently assist humans in various tasks, it is crucial to accurately decode and understand the rich information embedded in brain's visual cognition. Existing brain-driven research often fails to overcome the challenge of small target data domains, and the lack of explicit semantic, spatial, and other information constraints on feature extractors prevents brain decoding models from learning uniform cross-domain representations, leading to degradation of their performance in unseen domains. To overcome these limitations, we propose DAMind, a multimodal EEG-based model for robust visual cross-domain alignment and decoding. Our approach integrates VLM with brain-inspired cognitive mechanisms, leveraging the strong image-text representation abilities to learn both fine-grained primary visual features and high-level semantic concepts from neural signals, provide effective visual fine-tuning using the visual guidance mechanism. DAMind introduces a stepwise EEG encoding process aligned with visual processing, and employs an instruction-based learning strategy for effective cross-domain zero-shot transfer. Its robust architecture efficiently achieves good generalization performance, enabling the mapping of EEG signals from multiple domains to a unified learning domain. We construct a comprehensive EEG decoding benchmark EBench, DAMind achieves state-of-the-art results on several visual tasks, and outperforms the baseline in zero-shot setting.
Jing et al. (Thu,) studied this question.