November 27, 2002

Spectral voice conversion for text-to-speech synthesis

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

A new voice conversion algorithm that modifies a source speaker's speech to sound as if produced by a target speaker is presented. It is applied to a residual-excited LPC text-to-speech diphone synthesizer. Spectral parameters are mapped using a locally linear transformation based on Gaussian mixture models whose parameters are trained by joint density estimation. The LPC residuals are adjusted to match the target speakers average pitch. To study effects of the amount of training on performance, data sets of varying sizes are created by automatically selecting subsets of all available diphones by a vector quantization method. In an objective evaluation, the proposed method is found to perform more reliably for small training sets than a previous approach. In perceptual tests, it was shown that nearly optimal spectral conversion performance was achieved, even with a small amount of training data. However, speech quality improved with increases in the training set size.

Preguntar a la IA

Me gusta

Guardar

Cite This Study

Kain et al. (Wed,) studied this question.

synapsesocial.com/papers/6a2229bd4ae3d5108796fd46 https://doi.org/https://doi.org/10.1109/icassp.1998.674423

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Preguntar a la IA

Me gusta

Guardar