Artificial Intelligence (AI)-based voice cloning offers a novel approach to studying the perceptual effects of dysphonia, especially in rare voice disorders where clinical recordings are limited. This study evaluated whether AI-generated voice clones can replicate the intelligibility patterns of dysphonic speech. Clones were created for 12 speakers (6 dysphonic, 6 vocally healthy), and 64 listeners assessed both real and synthetic samples across three perceptual experiments. In the first experiment, a discrimination task, listeners performed best when both samples were real (RL-RL), with accuracies of 93.7% for dysphonic and 92.0% for healthy voices. Accuracy dropped sharply in the dysphonic RL-AI condition (31.1%). In the second experiment, an identification task, listeners were more accurate when both samples were real (66.8%) but showed greater confusion when synthetic voices were included. In the third experiment, listeners rated intelligibility in noise using the Hearing-in-Noise Test. AI-generated dysphonic voices were significantly more intelligible than real dysphonic speech, especially for male speakers (66.5% AI vs 35.5% real). Female speakers also showed improved or similar intelligibility with AI clones (67%). Findings suggest that current AI voice cloning models fail to replicate the reduced intelligibility and distinctive voice quality associated with dysphonia, raising concerns for clinical and research applications.
Bottalico et al. (Wed,) studied this question.