What question did this study set out to answer?

This study aims to evaluate the effectiveness of AI-generated voice clones in replicating the perceptual intelligibility of dysphonic speech.

May 14, 2026

Can artificial inteligence accurately clone dysphonic voices? A perceptual and intelligibility assessment

Key Points

This study aims to evaluate the effectiveness of AI-generated voice clones in replicating the perceptual intelligibility of dysphonic speech.
Created voice clones for 12 speakers (6 dysphonic, 6 healthy) using AI.
Conducted three perceptual experiments with 64 listeners to assess real and synthetic voice samples.
Used discrimination, identification, and intelligibility measures, including the Hearing-in-Noise Test.
Listeners correctly identified real dysphonic speech at 31.1% accuracy when compared to AI clones, a significant drop from all-real comparisons.
In noise, AI-generated dysphonic voices showed 66.5% intelligibility, outperforming real dysphonic voices at 35.5%.
AI clones did not replicate the reduced intelligibility of real dysphonic speech, indicating shortcomings in AI models.

Abstract

Artificial Intelligence (AI)-based voice cloning offers a novel approach to studying the perceptual effects of dysphonia, especially in rare voice disorders where clinical recordings are limited. This study evaluated whether AI-generated voice clones can replicate the intelligibility patterns of dysphonic speech. Clones were created for 12 speakers (6 dysphonic, 6 vocally healthy), and 64 listeners assessed both real and synthetic samples across three perceptual experiments. In the first experiment, a discrimination task, listeners performed best when both samples were real (RL-RL), with accuracies of 93.7% for dysphonic and 92.0% for healthy voices. Accuracy dropped sharply in the dysphonic RL-AI condition (31.1%). In the second experiment, an identification task, listeners were more accurate when both samples were real (66.8%) but showed greater confusion when synthetic voices were included. In the third experiment, listeners rated intelligibility in noise using the Hearing-in-Noise Test. AI-generated dysphonic voices were significantly more intelligible than real dysphonic speech, especially for male speakers (66.5% AI vs 35.5% real). Female speakers also showed improved or similar intelligibility with AI clones (67%). Findings suggest that current AI voice cloning models fail to replicate the reduced intelligibility and distinctive voice quality associated with dysphonia, raising concerns for clinical and research applications.

Mark Helpful

Bookmark

Relay