What question did this study set out to answer?

The aim is to develop AI models for effectively detecting and classifying voice disorders based on acoustic recordings.

May 25, 2026Open Access

AI-Driven Detection and Classification of Voice Disorders Using Acoustic Recordings

Puntos clave

The aim is to develop AI models for effectively detecting and classifying voice disorders based on acoustic recordings.
Multicenter study analyzing recordings from 1948 patients with voice disorders and 665 controls.
Models evaluated include HuBERT feature extraction and pretrained Audio Spectrogram Transformer.
Classifier performance assessed using AUROC and F1 score metrics.
Achieved near-perfect healthy vs pathological voice detection with an AUROC of 0.993 (95% CI, 0.986-0.996) and F1 score of 0.949.
Neurological vs non-neurological disorder distinction reached an AUROC of 0.744.
Other binary models showed modest performance with AUROCs ranging from 0.669 to 0.764.

Resumen

OBJECTIVES: To develop and evaluate artificial intelligence (AI) models for detecting and classifying voice disorders using acoustic recordings, aiming to facilitate earlier diagnosis and optimize clinical resource allocation. METHODS: This multicenter predictive modeling study analyzed data from a large cohort of 1948 patients with voice disorders and 665 controls collected at two Belgian hospitals between 2014 and 2025. Acoustic recordings of seven standardized speech tasks were analyzed, using a fixed split of 85% for training with 10-fold stratified cross-validation (CV), while the remaining 15% was reserved as an independent hold-out test set. Two modeling strategies were evaluated: (1) extraction of HuBERT features paired with various classifiers and (2) fine-tuning a pretrained Audio Spectrogram Transformer (AST). Six binary diagnostic classifiers were trained: healthy vs pathological and five one-vs-rest (OvR) classifiers within the pathological cohort (neurological, benign lesion, functional, inflammatory, and tumor). A hierarchical ensemble combined model results, using healthy vs pathological as the primary binary gatekeeper, with secondary OvR models to further classify specific voice disorders. Performance was assessed with AUROC and F1 score as primary metrics. RESULTS: Among 2613 total participants (median age 51 years for patients; 36 for controls), near-perfect detection of healthy vs pathological voices was achieved with an AUROC of 0.993 (95% CI, 0.986-0.996) and F1 score of 0.949. Performance for classifying specific disorder subtypes was lower; the distinction between non-neurological and neurological disorders achieved an AUROC of 0.744. Other binary models utilizing HuBERT features demonstrated modest performance, with AUROCs ranging from 0.669 to 0.764 and F1 scores from 0.447 to 0.680. CONCLUSIONS: While current AI models, particularly AST, demonstrate high diagnostic accuracy in distinguishing pathological from healthy voices, performance in classifying specific disorder subtypes requires further improvement. These findings suggest that AI-driven acoustic analysis has significant potential as a noninvasive screening tool supporting the earlier identification of voice disorders.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

L. Berteloot

AZ Delta

Fergio Sismono

University of Antwerp

Léonore Maertens

AZ Delta

Journals

Journal of Voice

Actions

Institutions

University of Antwerp

AZ Delta

Thomas More University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

AI-Driven Detection and Classification of Voice Disorders Using Acoustic Recordings

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study