April 17, 2019

Improving Deep Models of Speech Quality Prediction through Voice Activity Detection and Entropy-based Measures

Key Points

Key points are not available for this paper at this time.

Abstract

This paper explores Deep machine listening for Estimating Speech Quality (DESQ), which predicts the perceived speech quality based on phoneme posterior probabilities obtained from a deep neural network. The degradation of phonemes is quantified with the entropy-based Gini measure that is compared to the mean temporal distance (MTD) proposed earlier. Since long speech pauses might have a large effect on the speech quality, we investigate if a voice activity detection (VAD) has a beneficial or detrimental effect on the predictive power of our model. The evaluation is performed by correlating the model output and mean opinion scores (MOS) of normal-hearing listeners who rated signals degraded by typical VoIP artifacts. While the Gini-based measure and MTD result in very similar predictions (with a lower computational cost for the Gini-measure), the VAD increases performance from r = 0.87 to r = 0.91 which is higher than three competing baselines (ITU-P.563, ANIQUE+, and SRM-Rnorm).

KI fragen

Bookmark

Cite This Study

Ooster et al. (Wed,) studied this question.

synapsesocial.com/papers/6a1fdca67213e52ab10492c6 https://doi.org/https://doi.org/10.1109/icassp.2019.8682754

KI fragen

Bookmark