Key points are not available for this paper at this time.
Multichannel speech enhancement in complex noise environments remains a formidable challenge due to the necessity of adapting to multiple noise types, addressing nonlinear distortions, and other influencing factors. Current methods using utilizing time and frequency domains often result in inaccurate predictions or distortions. In this paper, we propose a self-attentive multi-feature fusion network to enhance the quality of multichannel speech in noisy environments. The method simultaneously extracts spatial and spectral features, performing feature fusion through the self-attention mechanism. Experimental results demonstrate that, compared with existing methods, our proposed approach achieves significant improvements, effectively enhancing speech quality in noisy environments. SDR show improvements of 0.4, respectively, when compared with other state-of-the-art multi-channel enhancement models.
Ruizhe Wang (Sun,) studied this question.