Abstract: Speech enhancement techniques aim to extract clean speech from noisy speech signals and to improve the performance of speech communication, recognition, and interaction systems. In particular, in complex noisy environments such as airports, where background noise is diverse and dynamically changing, traditional enhancement methods struggle to address these challenges effectively. Generative Adversarial Networks (GANs) have been widely used for speech enhancement, but SEGAN still lacks robustness in complex non-stationary noise environments. To address this issue, this paper proposes a SE-block Speech Enhancement Generative Adversarial Network (SSEGAN), which enhances the model’s ability to focus on speech-critical signals by introducing a channel attention mechanism. This mechanism automatically learns and assigns weights to each feature channel by applying global average pooling followed by a fully connected network, thereby achieving dynamic attention to speech-critical features in the generator. By enhancing the response to important channels and suppressing redundant or noise-dominated information, the model can more accurately extract the effective components of speech, thereby improving its ability to model speech structures. Experimental results show that SSEGAN outperforms the original SEGAN in terms of signal-to-noise ratio (SNR) improvement, speech quality, and intelligibility. The score of subjective quality assessment is high, and it has achieved a statistically significant advantage in intelligibility, and the reasoning time is reduced. The effectiveness of the channel attention mechanism in complex noise environments is verified. These improvements provide new ideas for the optimization of speech enhancement techniques in practical applications.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zichun Hua
Zhigang Lian
Noise Control Engineering Journal
Shanghai Dianji University
Building similarity graph...
Analyzing shared references across papers
Loading...
Hua et al. (Sat,) studied this question.
www.synapsesocial.com/papers/68c193de9b7b07f3a0617ab7 — DOI: https://doi.org/10.3397/1/377340