What question did this study set out to answer?

This research investigates how listeners ascribe demographic and personality traits to AI-generated voices.

May 13, 2026Open Access

Identity and personality in the social perception of synthesized voices: perceptions of OpenAI’s text-to-speech technology

Key Points

This research investigates how listeners ascribe demographic and personality traits to AI-generated voices.
Conducted a production and perception study on OpenAI's Whisper-generated voices.
Listeners rated synthesized voices for perceived demographic features and personality traits.
Performed acoustic analysis on the voices, examining properties like subharmonic-to-harmonic ratio and mean f0.
Listeners consistently associate specific voices with combinations of age, race/ethnicity, and personality traits.
Ratings of voices vary significantly based on listener demographics.
Acoustic features influence how listeners perceive and judge synthesized voices.

Abstract

As the line between human speakers and "AI-generated" voices becomes increasingly blurred, it is important to understand how sociolinguistic knowledge affects human-computer interaction. Human listeners have been shown to rely on real-world biases, along with acoustic cues and their social associations, to characterize AI-synthesized voices, but it is often unclear if or how these factors interact. We examined these issues by conducting a production and perception study on OpenAI's Whisper-generated voices. Listeners heard each of the generated voices and rated them for perceived demographic features and personality traits. We find that particular voices are consistently associated with specific combinations of age, race/ethnicity, gender, and personality traits; we also find that ratings differ by listener demographics. Acoustic analysis indicates that the voices differ in properties such as subharmonic-to-harmonic ratio, H1-H2, mean f0, and intonational contours. Altogether, we find that listeners from various backgrounds converge on meaningful, imagined personae for synthesized voices, and that prosodic features may influence how listeners arrive at these judgments. Human listeners readily ascribe real-world social characteristics to synthesized voices, demonstrating the importance of human experience in human-computer interaction and the deep entrenchment of social judgment in all kinds of communication, even with non-human actors.

Identity and personality in the social perception of synthesized voices: perceptions of OpenAI’s text-to-speech technology

Key Points

Abstract

Cite This Study

Also Consider

Also Consider