What does this research mean for the field?

The Spatiotemporal Hybrid Attention Network (STHANet) effectively decodes auditory attention from EEG signals and maintains significant above-chance performance even when gaze-related confounds are strictly controlled, outperforming existing direct AAD models. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This research aims to improve auditory attention decoding performance using a novel model, STHANet.

May 18, 2026Open Access

STHANet: Spatiotemporal Hybrid Attention Network for auditory attention decoding

Key Points

This research aims to improve auditory attention decoding performance using a novel model, STHANet.
Developed STHANet, a dual-branch model integrating spatial filtering and temporal characterization.
Conducted experiments on KUL and DTU datasets with various partitions.
Evaluated AAD model performance under controlled gaze-related conditions.
Achieved 93.6% accuracy on KUL and 75.8% on DTU under within-trial partitioning.
Under strict cross-trial partitioning, accuracies were 76.7% on KUL and 66.1% on DTU.
STHANet significantly maintained performance above chance in challenging gaze-incongruent scenarios.

Abstract

Abstract Auditory attention decoding (AAD) aims to detect the target speaker from electroencephalography (EEG) signals in multi‐talker environments. Existing methods often insufficiently exploit joint spatial and temporal information, which limits decoding performance. This paper presents STHANet (spatiotemporal hybrid attention network), a dual‐branch model that integrates depth‐wise spatial filtering, log–variance temporal characterization, and transformer‐based spatiotemporal fusion. Experiments on the KUL and DTU datasets show that STHANet achieves competitive performance with a 1‐s decision window, reaching accuracies of 93.6% and 75.8% under within‐trial partitioning and 76.7% and 66.1% under strict cross‐trial partitioning, respectively. Further evaluation on the AV–GC–AAD dataset under moving‐target gaze‐incongruent conditions shows that all evaluated direct AAD models decrease to near‐chance performance when gaze‐related shortcuts are more strictly controlled, whereas only STHANet remains significantly above chance. These findings support the effectiveness of STHANet for spatiotemporal EEG feature extraction and highlight the importance of controlling both data‐partitioning bias and gaze‐related confounds in direct AAD.

Read Full Paperexternally

AIに質問

Bookmark

View Full Paper

Cite This Study

Xu et al. (Fri,) studied this question.

synapsesocial.com/papers/6a0aacb35ba8ef6d83b70032 https://doi.org/https://doi.org/10.1002/jim4.70038

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

AIに質問

Bookmark

View Full Paper