Spoken language contains critical information related to "what" is being said as well as "who" is talking. Listeners use various cues from speech to identify talkers. Talkers can be identified primarily based on acoustic cues, but access to higher-level cues depends on the extent of language familiarity. Do listeners have greater processing costs when there is a greater number of cues available? Or do listeners leverage cues in a reverse hierarchical manner, using higher-level cues more strategically to reduce processing costs? Here, we leveraged multiple methodological approaches, including pupillometry, analysis of the behavioral error pattern, and drift diffusion models (DDMs), to examine the dynamics of how learners identify novel talkers and accrue sensory evidence, balance cue usage, and manage processing costs during talker identity learning. Native English-speaking adults learned to identify talkers in English and Mandarin, then were tested on recognizing them speaking new sentences, all while their pupillary responses, a proxy for processing cost, were recorded. Talker identification was more accurate and induced less pupil dilation in English relative to Mandarin. Analysis of error patterns showed that listeners relied more heavily on low-level acoustic features in Mandarin than English, and DDMs revealed that higher evidence accumulation rates were associated with smaller pupil dilation. These findings suggest that, whenever available, listeners primarily use higher-level abstract representations to identify talkers from the beginning of the learning process while relying on less efficient, lower-level features of speech sounds in unfamiliar languages, consistent with the predictions of the reverse hierarchical framework.
Choi et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: