July 28, 2024Open Access

ctPuLSE: Close-Talk, and Pseudo-Label Based Far-Field, Speech Enhancement

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

The current dominant approach for neural speech enhancement is via purely-supervised deep learning on simulated pairs of far-field noisy-reverberant speech (i. e. , mixtures) and clean speech. The trained models, however, often exhibit limited generalizability to real-recorded mixtures. To deal with this, this paper investigates training enhancement models directly on real mixtures. However, a major difficulty challenging this approach is that, since the clean speech of real mixtures is unavailable, there lacks a good supervision for real mixtures. In this context, assuming that a training set consisting of real-recorded pairs of close-talk and far-field mixtures is available, we propose to address this difficulty via close-talk speech enhancement, where an enhancement model is first trained on simulated mixtures to enhance real-recorded close-talk mixtures and the estimated close-talk speech can then be utilized as a supervision (i. e. , pseudo-label) for training far-field speech enhancement models directly on the paired real-recorded far-field mixtures. We name the proposed system ctPuLSE. Evaluation results on the CHiME-4 dataset show that ctPuLSE can derive high-quality pseudo-labels and yield far-field speech enhancement models with strong generalizability to real data.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Zhong-Qiu Wang (Sun,) studied this question.

synapsesocial.com/papers/68e5ec45b6db6435875814c7 https://doi.org/https://doi.org/10.48550/arxiv.2407.19485