September 1, 2024

DiffVC+: Improving Diffusion-based Voice Conversion for Speaker Anonymization

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

The increasing risks of speech data leakage prompt growing concerns about voice privacy. This paper proposes DiffVC+, a speaker anonymization model designed to preserve speech privacy. It operates as a diffusion-based voice conversion model that suppresses identity information by converting the speaker's voice through flexible approaches. DiffVC+ comprises a self-supervised learning (SSL) content encoder that effectively extracts the source speech content, a speaker encoder and an embedding generator that both supply the target speaker embedding, and a diffusion-based decoder generating the converted speech. Furthermore, we propose DiffVC+ light and DiffVC+ decoupled for edge-side and server-side deployments, respectively. Experimental results demonstrate that our models significantly outperform the baseline in terms of the intelligibility and naturalness of the converted speech, while achieving competitive anonymization performance.

Me gusta

Guardar

Cite This Study

Huang et al. (Sun,) studied this question.

synapsesocial.com/papers/68e59c5cb6db643587537025 https://doi.org/https://doi.org/10.21437/interspeech.2024-502