The presence of background noise in acoustic signals, such as speech, audio, and sound signals, degrades listening quality and causes hearing fatigue to the listener. Standard methods offer better signal enhancement under high SNR conditions. Deep neural networks employed in image processing and speech recognition have demonstrated significant performance improvements. This motivates the usage of deep neural networks for denoising speech signals corrupted with multiple noises under low SNR conditions (0 dB). This study applied two different types of deep neural networks, convolutional neural networks and deep generative networks, to remove background noise from speech signals under low SNR conditions. The noise reduction networks were trained to estimate the noise signal present, which was then subtracted to obtain the denoised speech signal. Two convolutional neural network architectures, the UNet and the Convolutional Encoder-Decoder network (CED), and two deep generative networks, Vector Quantized Variational Autoencoders (VQVAE) and Variational Autoencoders (VAE), were trained on STFT magnitude features of noisy signal frames. Four objective quality measures were used to determine the quality of the enhanced speech, namely Perceptual Evaluation of Speech Quality (PESQ), Short Time Objective Intelligibility (STOI), Segmental Signal to Noise Ratio (SSNR), and improvement in SNR. Spectral subtraction and logMMSE methods were used to evaluate the performance of these networks in two datasets. The results of the comparative analysis support the superiority of CED for signal denoising and enhancement of speech signals for multiple noises under low SNR conditions, with a much smaller number of model parameters compared to other methods for both seen and unseen noise conditions.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shibani Kar
Sambalpur University
V. Mukherjee
Sambalpur University
Engineering Technology & Applied Science Research
Building similarity graph...
Analyzing shared references across papers
Loading...
Kar et al. (Sat,) studied this question.
synapsesocial.com/papers/68c1ac0954b1d3bfb60e4953 — DOI: https://doi.org/10.48084/etasr.10571