March 18, 2024Open Access

Diffusion-Based Speech Enhancement with a Weighted Generative-Supervised Learning Loss

Key Points

Key points are not available for this paper at this time.

Abstract

Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods. These models transform clean speech training samples into Gaussian noise, usually centered on noisy speech, and subsequently learn a parameterized model to reverse this process, conditionally on noisy speech. Unlike supervised methods, generative-based SE approaches often rely solely on an unsupervised loss, which may result in less efficient incorporation of conditioned noisy speech. To address this issue, we propose augmenting the original diffusion training objective with an ℓ 2 loss, measuring the discrepancy between ground-truth clean speech and its estimation at each diffusion time-step. Experimental results demonstrate the effectiveness of our proposed methodology.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper