Key points are not available for this paper at this time.
Subjective evaluation is the gold standard for the evaluation of speech in different tasks such as text-to-speech (TTS), and voice-cloning (VC). However, these evaluations can be costly, time-consuming, and not easily scalable. Therefore, to tackle these challenges, we propose IndicMOS, a multilingual MOS predictor for Indian languages. We train our models on ratings data from Indic TTS and TTS + VC Challenges. We assess open-source MOS predictors, train unsupervised MOS predictors and fine-tune Wav2Vec2-based pre-trained models. We further incorporate additional features to enhance performance. Additionally, we analyze zero-shot evaluation results for Indian languages, presenting mean squared error and correlation metrics. Achieving a Kendall Tau of 0.8095 (system level) and 0.7143 (utterance level) for TTS, and 0.5131 (system level) and 0.4292 (utterance level) for TTS + VC, we also release our best models as open-source.
Building similarity graph...
Analyzing shared references across papers
Loading...
Udupa et al. (Sun,) studied this question.
www.synapsesocial.com/papers/68e59e92b6db643587538b3e — DOI: https://doi.org/10.21437/interspeech.2024-1967
Sathvik Udupa
Soumi Maiti
Prasanta Ghosh
Building similarity graph...
Analyzing shared references across papers
Loading...