September 1, 2024

IndicMOS: Multilingual MOS Prediction for 7 Indian languages

Key Points

Key points are not available for this paper at this time.

Abstract

Subjective evaluation is the gold standard for the evaluation of speech in different tasks such as text-to-speech (TTS), and voice-cloning (VC). However, these evaluations can be costly, time-consuming, and not easily scalable. Therefore, to tackle these challenges, we propose IndicMOS, a multilingual MOS predictor for Indian languages. We train our models on ratings data from Indic TTS and TTS + VC Challenges. We assess open-source MOS predictors, train unsupervised MOS predictors and fine-tune Wav2Vec2-based pre-trained models. We further incorporate additional features to enhance performance. Additionally, we analyze zero-shot evaluation results for Indian languages, presenting mean squared error and correlation metrics. Achieving a Kendall Tau of 0.8095 (system level) and 0.7143 (utterance level) for TTS, and 0.5131 (system level) and 0.4292 (utterance level) for TTS + VC, we also release our best models as open-source.

Ask AI

Helpful

Bookmark