Rapid serial visual presentation (RSVP) enables efficient electroencephalography (EEG)-based brain-computer interfaces, yet single-trial decoding remains difficult due to signal overlap and multi-component entanglement. This work developed DisCo-Former, a Transformer-based framework incorporating three priors-guided components, including trend-periodicity disentanglement, channel-level embeddings that preserve global temporal pattern, and contrastive learning that exploits target-adjacent non-targets. Although DisCo-Former surpassed existing approaches, analysis revealed a consistent attention collapse: attention maps became nearly uniform, and value projection weights shrank toward zero. Removing the Transformer encoder yields DisCo-MLP, a purely multilayer perceptron (MLP) variant that preserves all remaining modules. Across two datasets and three evaluation regimes, DisCo-MLP matched or outperformed its Transformer-based counterpart. In within-subject decoding, mean AUCs ranged from approximately 0.94 to 0.98 across two datasets, consistently exceeding strong baselines. These results indicate that, for RSVP-EEG decoding, effectiveness stems less from architectural complexity and more from modeling the signal's structure. Simplicity motivated by paradigm-specific neurophysiological priors offers a practical path to state-of-the-art performance in EEG-based interfaces.
Zhang et al. (Fri,) studied this question.