Speech perception is a complex multimodal process that often involves auditory and visual information. Studies of audio-visual asynchrony in speech reveal an asymmetry, with listeners performing better when visual cues precede than follow auditory input. The current study investigates how audio-visual information and asynchrony affect speech sentence intelligibility across different acoustic backgrounds. We hypothesized that visual information and synchrony play a more important role under informational masking (e.g., speech-in-speech conditions), than under energetic masking (e.g., speech-in-noise conditions) because visual cues may help in perceptually segregating the target from the masker speech. Experiment 1 tested this hypothesis by comparing audio-only and audio-visual speech intelligibility in backgrounds of noise or competing talkers over a range of target-to-masker ratios. Our findings confirmed a greater audio-visual benefit for speech-in-speech than speech-in-noise conditions. Experiment 2 tested the effects of audio-visual temporal asynchrony. Preliminary results show a surprisingly shallow function in both conditions, relative to previous work using single words or vowels, when performance is plotted as a function of asynchrony. If confirmed, these results suggest that sentence-level stimuli may involve additional (cognitive) processes that mitigate the effects of temporal asynchrony on audio-visual speech intelligibility. Work supported by NIH grant R01 DC016119.
Lee et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: