March 18, 2024Open Access

Leveraging Sound Localization to Improve Continuous Speaker Separation

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Continuous speaker separation aims to separate overlapping speakers in real-world environments like meetings, but it often falls short in isolating speech segments of a single speaker. This leads to split signals that adversely affect downstream applications such as automatic speech recognition and speaker diarization. Existing solutions like speaker counting have limitations. This paper presents a novel multi-channel approach for continuous speaker separation based on multi-input multi-output (MIMO) complex spectral mapping. This MIMO approach enables robust speaker localization by preserving inter-channel phase relations. Speaker localization as a byproduct of the MIMO separation model is then used to identify single-talker frames and reduce speaker splitting. We demonstrate that this approach achieves superior frame-level sound localization. Systematic experiments on the LibriCSS dataset further show that the proposed approach outperforms other methods, advancing state-of-the-art speaker separation performance.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo