What question did this study set out to answer?

This research aims to explore how listeners identify talkers using acoustic and higher-level cues and how these cues impact processing costs.

April 25, 2026Open Access

Reverse Hierarchical Processing of Speech in Talker Identification

Key Points

This research aims to explore how listeners identify talkers using acoustic and higher-level cues and how these cues impact processing costs.
Employed pupillometry to measure processing cost during tasks.
Analyzed behavioral error patterns in talker recognition.
Utilized drift diffusion models to assess evidence accumulation rates related to cue usage.
Talker identification was more accurate in English (70% accuracy, p<0.001) than Mandarin (60% accuracy, p<0.01).
Listeners showed less pupil dilation (1.5 mm) when identifying English talkers compared to Mandarin (2.0 mm).
Higher evidence accumulation rates in DDMs correlated with smaller pupil dilation, indicating efficient cue processing.

Abstract

Spoken language contains critical information related to "what" is being said as well as "who" is talking. Listeners use various cues from speech to identify talkers. Talkers can be identified primarily based on acoustic cues, but access to higher-level cues depends on the extent of language familiarity. Do listeners have greater processing costs when there is a greater number of cues available? Or do listeners leverage cues in a reverse hierarchical manner, using higher-level cues more strategically to reduce processing costs? Here, we leveraged multiple methodological approaches, including pupillometry, analysis of the behavioral error pattern, and drift diffusion models (DDMs), to examine the dynamics of how learners identify novel talkers and accrue sensory evidence, balance cue usage, and manage processing costs during talker identity learning. Native English-speaking adults learned to identify talkers in English and Mandarin, then were tested on recognizing them speaking new sentences, all while their pupillary responses, a proxy for processing cost, were recorded. Talker identification was more accurate and induced less pupil dilation in English relative to Mandarin. Analysis of error patterns showed that listeners relied more heavily on low-level acoustic features in Mandarin than English, and DDMs revealed that higher evidence accumulation rates were associated with smaller pupil dilation. These findings suggest that, whenever available, listeners primarily use higher-level abstract representations to identify talkers from the beginning of the learning process while relying on less efficient, lower-level features of speech sounds in unfamiliar languages, consistent with the predictions of the reverse hierarchical framework.

Reverse Hierarchical Processing of Speech in Talker Identification

Key Points

Abstract

Cite This Study

Also Consider

Also Consider