Key points are not available for this paper at this time.
Machine learning (ML) has shown promising results in the field of voice disorder detection over the past decade. However, the diversity of recording conditions, audio content, languages, and the scarcity of examples for each of these combinations pose a challenge in building ML models that can reliably detect voice disorders. Recent advancements in Self-Supervised Learning (SSL) offer hope by leveraging large datasets to pretrain models and extract audio features with high resilience for downstream tasks.In this paper, we fairly exhaustively explore commonly used SSL model representations to assess their suitability for addressing the downstream task of voice disorder detection. Using a combination of Support Vector Machines (SVM) and feedforward Deep Neural Networks (DNN) we show: i) that the combination of vowels /a/,/i/, and /u/ perform better than individual vowels; ii) SSL-based features generalize well to out-of-domain databases, and iii) that while spectral features like MFCC perform equally well compared to SSL-based features when trained and tested on the same database, performances seems to deteriorate when training and testing across different databases.
Gupta et al. (Mon,) studied this question.
Synapse has enriched 3 closely related papers on similar clinical questions. Consider them for comparative context: