Following conversation in a complex environment requires a listener to maintain continuous attention and use other sensory input to maximize speech reception. In a multi-talker conversation, that means turning the head to match the talker-location and using visual cues (e.g., lip and facial expressions). To test the interactions between these processes, we devised an experiment with two competing speech streams. The first stream used materials adapted from a video-podcast with four talkers, each on separate screens, arrayed horizontally with 30° separation. The second stream was composed of monosyllabic words interspersed with digits (CNIT) Ozmeral et al., Int. J. Audiol. 59(6), 434–442 (2020). Head movements were recorded during each 90-sec trial. Both streams were presented co-located, at equal SNR, from the same loudspeaker as the active talker in the video-podcast conversation. Before each trial, participants (N = 13, young normal-hearing) were cued to either a single-task (detect names or digits) or dual-task (detect names and digits). Significantly higher accuracy was observed detecting digits versus names in both the single- and dual-task conditions. Correlations between head-orientation and talker-location show that as participants’ head-orientation followed the talker-location, name-detection accuracy improved while digit-detection degraded. Together, these results demonstrate the complex interactions of head-movement and visual stimuli on segregation of competing auditory streams.
Higgins et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: