What type of study is this?

September 10, 2025

Semi-supervised deep monaural speech enhancement with positive-negative-unlabeled learning

Key Points

The proposed method achieves enhanced performance in monaural speech enhancement with limited supervised data.
Unsupervised data collection from sources like smart speakers significantly benefits the learning process.
The approach classifies time-frequency bins as speech-dominant or noise-dominant using deep neural networks.
Integrating unlabeled data allows the method to outperform traditional supervised learning techniques.

Abstract

Monaural speech enhancement (SE) is a technique for extracting a clean speech signal from a monaural noisy speech signal. Its mainstream approach, supervised learning, uses supervised data, i.e., pairs of clean and noisy speech data. However, this approach has the problem that supervised data are expensive because recording clean speech data requires a quiet environment such as a studio. In this paper, an SE method using a semi-supervised learning method called positive-negative-unlabeled (PNU) learning is proposed. To achieve high SE performance even with limited supervised data, the proposed method leverages unsupervised data, i.e., only noisy speech data. Note that unsupervised data can be easily collected, e.g., from smart speakers or the Web. In our method, a deep neural network predicts a binary mask for SE by classifying time-frequency bins as speech-dominant (positive, P) or noise-dominant (negative, N). It is trained through PNU learning using P and N data from supervised data and unlabeled (U) data from unsupervised data. An experiment confirmed that increasing U data improves the SE performance of the proposed method and enables it to outperform supervised learning.

Mark Helpful

Bookmark

Relay