Objectives: This work aims to enhance foreground speech by effectively removing unwanted background noise and recovering the desired signal, utilizing deep learning approaches with limited training data. Methods: This study addresses the above issue using a transfer learning-based technique that uses mel-spectrograms. Specifically, it proposes a transfer learning approach that builds on a pre-trained residual network (based on wav2vec2 model) that includes a statistics pooling layer as used in speaker recognition. The model is then trained using a limited amount of clean and noisy datasets. In addition, we adopt a log mel-spectrogram feature extraction technique to improve the generalization of speech enhancement models. The database used here is from the Noisy Speech Database curated by Valentini-Botinhao, Cassia (2017) and the LibriSpeech corpus. Findings: Using the same dataset, the performances of the baseline model of an autoencoder and a multilayer autoencoder were compared with the proposed model. The proposed approach with an STOI score of 0.88 and an SNR improvement of 3.27 dB, outperforms both the baseline models in subjective and objective evaluation. Novelty: This work eliminates signal truncation, a constraint observed in conventional speech enhancement pipelines, by integrating a statistics pooling layer with a pre-trained wav2vec2-based residual network for variable-length input handling. Furthermore, the model's robustness and flexibility are enhanced by the use of log mel-spectrograms in this context, allowing it to produce state-of-the-art results even with sparse supervised training data. Keywords: Denoise, Mel-spectrogram, Signal Processing, Transfer Learning, Wav2Vec2
Building similarity graph...
Analyzing shared references across papers
Loading...
Debabrata Gogoi
Sushanta Kabir Dutta
Indian Journal of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Gogoi et al. (Fri,) studied this question.
www.synapsesocial.com/papers/68e25378d6d66a53c24740cf — DOI: https://doi.org/10.17485/ijst/v18i35.2678