What question did this study set out to answer?

This research investigates how various acoustic features and contextual factors affect the perception of synthetic speech in public spaces.

March 27, 2026Open Access

Modeling perceptual dimensions of synthetic speech in High-SNR public environments: An SD-based evaluation of clarity, comfort, and preference

Key Points

This research investigates how various acoustic features and contextual factors affect the perception of synthetic speech in public spaces.
Used a factorial design to manipulate speaker gender, evaluator gender, pitch, sound pressure level, and location across 60 speech samples.
Participants evaluated each sample on scales measuring liking and listening ease.
Applied principal component analysis to identify key perceptual factors influencing evaluations.
Conducted regression analyses to determine predictors for liking and listening ease.
Four main factors were identified: emotional positivity, auditory clarity, interpersonal comfort, and brightness.
Liking was moderately correlated with listening ease but driven by different predictors.
Acoustic comfort depended on both intelligibility and socio-affective perceptions tied to speaker identity.
Male voices received higher ratings for warmth and preference, independent of evaluator gender.

Abstract

This study investigates how acoustic features and contextual cues influence the perceptual evaluation of synthetic speech in quiet public environments, such as train stations and airports. Using a factorial design, we manipulated five key factors—speaker gender, evaluator gender, pitch, sound pressure level (SPL), and simulated location—across 60 synthetic speech samples. Participants evaluated each sample on multiple semantic differential scales assessing liking and listening ease. Principal component analysis revealed four key factors: emotional positivity, auditory clarity, interpersonal comfort, and brightness. Notably, liking and listening ease were moderately correlated but driven by different acoustic predictors. Stepwise multiple regression and partial least-squares regression indicated that “pleasant” and “comfortable” primarily predicted liking, whereas clarity-related cues more strongly accounted for listening ease. Additionally, male voices were rated higher in warmth and preference, regardless of evaluator gender. These results suggest that acoustic comfort in public settings depends not only on intelligibility but also on socio-affective perceptions tied to speaker identity and vocal attributes. • Assessed speech intelligibility and preference in quiet public sound settings. • Manipulated pitch, SPL, and gender in synthetic speech using a factorial design. • Found distinct predictors for “liking” versus “ease of listening” judgments. • Male voices were preferred across genders despite matched intelligibility. • PLS regression revealed affective and acoustic cues as key evaluative factors.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper