November 30, 1998

HMM-based smoothing for concatenative speech synthesis

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

This paper will focus on our recent efforts to further improve the acoustic quality of the Whistler Text-to-Speech engine. We have developed an advanced smoothing system that a small pilot study indicates significantly improves quality. We represent speech as being composed of a number of frames, where each frame can be synthesized from a parameter vector. Each frame is represented by a state in an HMM, where the output distribution of each state is a Gaussian random vector consisting of x and Dx. The set of vectors that maximizes the HMM probability is the representation of the smoothed speech output. This technique follows our traditional goal of developing methods whose parameters are automatically learned from data with minimal human intervention. The general framework is demonstrated to be robust by maintaining improved quality with a significant reduction in data. 1. INTRODUCTION In contrast to most Text-To-Speech (TTS) systems (including both formant and concatena...

Me gusta

Guardar

Cite This Study

Plumpe et al. (Mon,) studied this question.

synapsesocial.com/papers/6a1bcbe31567d2fc4d5f03f1 https://doi.org/https://doi.org/10.21437/icslp.1998-52

Me gusta

Guardar