Vowel contrasts tend to build in similar ways cross-linguistically, dispersing themselves within an acoustic space defined by the first two formants, F1 (height) and F2 (backness). While vowel dispersion models successfully predict vowel contrasts in the F1 dimension, they fail to predict that far fewer vowels occur along F2. Work since the 1970s has defaulted to the auditory system to explain this asymmetry. We hypothesize that in the ascending subcortical auditory pathway, effects of nonlinearities on the representation of formants constrain which vowels may occur. We constructed ecologically valid vowel systems using synthetic vowels with added speech-shaped noise. We simulated neural responses to the synthetic vowels using a physiological model and used a linear support vector machine to classify the responses into vowel categories. Preliminary results show that F1 has a more robust regional coding scheme in noise than F2’s simpler representation. F1 supports more contrasts in comparison to F2, which is more vulnerable to noise. These results suggest that auditory peripheral tuning and nonlinearities, such as saturation, compression, and suppression, constrain the structure of vowel systems and the encoding of vowel spectra. Work supported by NIDCD-R01-010813.
Pyskaty et al. (Tue,) studied this question.