Speech perception depends on integration of multiple acoustic cues that unfold over time, and fine-grained details about these cues are crucial for speech sound categorization. Classic work in speech perception argued that these acoustic details are lost after sounds are sorted into phoneme categories. However, recent research shows that this information may be maintained over longer timescales spanning multiple words. Determining the extent and nature of maintenance is necessary for developing models that accurately represent information at each stage of speech processing. The current study investigates cue integration over prolonged timescales based on approaches used in visual perception that demonstrate maintenance for multiple cues. On each trial, listeners heard 1–5 instances of a word from a minimal pair, separated by gaps that were approximately 1–8 seconds long. Samples varied in voice onset time and were drawn from a distribution spanning the English /b/ and /p/ categories. After listening to each set of samples, listeners made a two-alternative forced choice response between the voiced and voiceless word pairs. Accuracy improved as more samples were presented (p 0.001) but was not affected by gap length (p0.05), suggesting that listeners used and maintained information about the cues over both short and long timescales.
Kennedy et al. (Wed,) studied this question.