Listeners face a critical challenge in speech perception: acoustic cues to a given speech sound often unfold asynchronously. Traditional work suggests listeners solve this problem by processing each cue immediately and continuously to update higher-level interpretations. However, recent findings suggest that certain speech sounds (e.g., voiceless sibilant fricatives) may be buffered—judgment about the identity of the fricative is withheld until the vocoid arrives. It is unclear what triggers the release of this buffer: is it driven by specific incoming information or fixed cue duration? Using the visual-world paradigm, we tested this by extending the durations of /s/ and /ʃ/ to double (∼300 ms) and triple (∼450 ms) their typical lengths (∼150 ms). Consistent with previous findings, listeners withheld processing typical-length fricatives until frication offset. For double-length fricatives, listeners extended the buffer, waiting for the entire fricative duration before committing to a decision. When fricative lengths were tripled, partial commitment emerged during the frication period but final decisions still awaited additional cues. This suggests that buffering is largely based on the arrival of the vocoid, though at extreme durations listeners may access partial information. Thus, listeners flexibly delay processing as they wait to integrate current acoustic information with upcoming cues.
Kim et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: