September 1, 2024Open Access

JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

Key Points

Key points are not available for this paper at this time.

Abstract

Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent aliasing and reduce artifacts while preserving the model structure used during inference. In our experimental evaluation, JenGAN consistently enhances the performance of vocoder models, yielding significantly superior scores across the majority of evaluation metrics.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper

Cite This Study

Cho et al. (Sun,) studied this question.

synapsesocial.com/papers/68e59e8eb6db6435875389a1 https://doi.org/https://doi.org/10.21437/interspeech.2024-1447

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Perguntar à IA

Bookmark

View Full Paper