March 18, 2024Open Access

Addressing Data Scarcity in Voice Disorder Detection with Self-Supervised Models

Key Points

Key points are not available for this paper at this time.

Abstract

Machine learning (ML) has shown promising results in the field of voice disorder detection over the past decade. However, the diversity of recording conditions, audio content, languages, and the scarcity of examples for each of these combinations pose a challenge in building ML models that can reliably detect voice disorders. Recent advancements in Self-Supervised Learning (SSL) offer hope by leveraging large datasets to pretrain models and extract audio features with high resilience for downstream tasks.In this paper, we fairly exhaustively explore commonly used SSL model representations to assess their suitability for addressing the downstream task of voice disorder detection. Using a combination of Support Vector Machines (SVM) and feedforward Deep Neural Networks (DNN) we show: i) that the combination of vowels /a/,/i/, and /u/ perform better than individual vowels; ii) SSL-based features generalize well to out-of-domain databases, and iii) that while spectral features like MFCC perform equally well compared to SSL-based features when trained and tested on the same database, performances seems to deteriorate when training and testing across different databases.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Gupta et al. (Mon,) studied this question.

synapsesocial.com/papers/68e73894b6db6435876b221c https://doi.org/https://doi.org/10.1109/icassp48485.2024.10446075

Also Consider

Synapse has enriched 3 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper