Los puntos clave no están disponibles para este artículo en este momento.
The increasing risks of speech data leakage prompt growing concerns about voice privacy. This paper proposes DiffVC+, a speaker anonymization model designed to preserve speech privacy. It operates as a diffusion-based voice conversion model that suppresses identity information by converting the speaker's voice through flexible approaches. DiffVC+ comprises a self-supervised learning (SSL) content encoder that effectively extracts the source speech content, a speaker encoder and an embedding generator that both supply the target speaker embedding, and a diffusion-based decoder generating the converted speech. Furthermore, we propose DiffVC+ light and DiffVC+ decoupled for edge-side and server-side deployments, respectively. Experimental results demonstrate that our models significantly outperform the baseline in terms of the intelligibility and naturalness of the converted speech, while achieving competitive anonymization performance.
Huang et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: