This study investigates how acoustic features and contextual cues influence the perceptual evaluation of synthetic speech in quiet public environments, such as train stations and airports. Using a factorial design, we manipulated five key factors—speaker gender, evaluator gender, pitch, sound pressure level (SPL), and simulated location—across 60 synthetic speech samples. Participants evaluated each sample on multiple semantic differential scales assessing liking and listening ease. Principal component analysis revealed four key factors: emotional positivity, auditory clarity, interpersonal comfort, and brightness. Notably, liking and listening ease were moderately correlated but driven by different acoustic predictors. Stepwise multiple regression and partial least-squares regression indicated that “pleasant” and “comfortable” primarily predicted liking, whereas clarity-related cues more strongly accounted for listening ease. Additionally, male voices were rated higher in warmth and preference, regardless of evaluator gender. These results suggest that acoustic comfort in public settings depends not only on intelligibility but also on socio-affective perceptions tied to speaker identity and vocal attributes. • Assessed speech intelligibility and preference in quiet public sound settings. • Manipulated pitch, SPL, and gender in synthetic speech using a factorial design. • Found distinct predictors for “liking” versus “ease of listening” judgments. • Male voices were preferred across genders despite matched intelligibility. • PLS regression revealed affective and acoustic cues as key evaluative factors.
Asakura et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: