August 27, 2021Open Access

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

We propose using self-supervised discrete representations for the task of resynthesis. To generate disentangled representation, we separately low-bitrate representations for speech content, prosodic information, speaker identity. This allows to synthesize speech in a controllable. We analyze various state-of-the-art, self-supervised representation methods and shed light on the advantages of each method while reconstruction quality and disentanglement properties. , we evaluate the F0 reconstruction, speaker identification (for both resynthesis and voice conversion), recordings', and overall quality using subjective human evaluation. Lastly, demonstrate how these representations can be used for an ultra-lightweight codec. Using the obtained representations, we can get to a rate of 365 per second while providing better speech quality than the baseline. Audio samples can be found under the following link: . github. io/resynthesis.

Me gusta

Guardar

Ver artículo completo