This study addresses a significant gap in voice assistant research by evaluating the responsiveness - the speed at which a TTS system generates speech in reaction to input, crucial for maintaining natural, real-time interactions - of open-source text-to-speech (TTS) models—an often overlooked yet critical component for real-time applications. While extensive benchmarking has been performed on speech-to-text and large language models, little work has focused on how efficiently TTS systems respond in live settings—largely because TTS research has historically prioritized subjective quality metrics like naturalness and intelligibility, which are easier to assess through human listening tests than real-time performance; additionally, the lack of standardized, reproducible tools for measuring latency and responsiveness has further limited progress in this area. This work presents the first comprehensive benchmark focused on responsiveness—assessing TTS latency, tail latency, and real-time processing performance across 13 prominent open-source, readily available models, in contrast to commercial systems like Amazon Polly or Google Cloud TTS, which are closed-source and paywalled. Using a standardized single-stream evaluation inspired by MLPerf Inference, the study measures model responsiveness under controlled conditions and also investigates trade-offs between speed and audio quality. Results reveal substantial variability across models, with some achieving sub-second latency suitable for interactive systems, while others fall short of real-time standards. The benchmark highlights performance bottlenecks in autoregressive architectures and identifies parallel and flow-based models as more efficient for low-latency scenarios. Importantly, the proposed framework provides a reproducible foundation for comparing TTS models in latency-sensitive environments and sets a baseline for future research. By focusing on responsiveness, this work contributes to the development of more effective and natural voice interfaces.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ha Pham Thien Dinh
Rutherford Agbeshi Patamia
Ming Liu
Building similarity graph...
Analyzing shared references across papers
Loading...
Dinh et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68a36c270a429f797332fe9b — DOI: https://doi.org/10.20944/preprints202508.0654.v1
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: