In complex acoustic scenarios, the human auditory system excels at rapidly and accurately identifying target sounds. A deeper understanding of its mechanisms in such environments could significantly enhance the robustness and generalization capabilities of sound event detection systems. This paper investigates the neural representations and decoding of target sound perception in complex auditory scenes using non-invasive neural signals. We propose a novel experimental paradigm that simulates complex acoustic conditions by varying parameters such as target sound characteristics, background noise levels, and interfering events. Multi-view neural representations- including time, frequency, and source domains-are extracted and analyzed using statistical methods to examine their relationships with these variables. To decode these neural activities, we propose the Auditory Cortex-inspired Dual Attention Network (AC-DANet), an architecture functionally inspired by known attentional pathways in the auditory cortex. The model achieves robust three-class Electroencephalogram (EEG) decoding for target sound perception in challenging auditory scenes, with experimental results demonstrating strong performance and cross subject generalization. This study advances our understanding of the neural information transmission process underlying sound target perception in complex acoustic environments. It offers novel insights into the cognitive functions of the human auditory system, while providing a theoretical foundation and technical framework for the development of advanced sound event detection systems in challenging acoustic settings.
Shi et al. (Thu,) studied this question.