This study examines the production of Voice Onset Time (VOT) in English stop consonants by native speakers of American English and Arabic-speaking English as a Foreign Language (EFL) learners at two proficiency levels. VOT, an acoustic parameter, is an essential feature in distinguishing between voiced and voiceless stops. Drawing on Flege’s Speech Learning Model (SLM), the research investigates whether learners differentiate between voiceless and voiced stops (/p/ vs. /b/) and apply appropriate aspiration in /sp/ clusters, and whether proficiency influences VOT patterns. Data were collected from 29 native English speakers and 58 Arabic-speaking learners, who produced minimal pairs and /sp/ cluster words embedded in carrier sentences. All tokens were annotated manually in Praat and analyzed using linear mixed effects models. Results showed that native speakers maintained robust VOT distinctions, while Novice-High learners exhibited overlapping distributions between /p/ and /b/ and inappropriate aspiration in clusters. Intermediate-High learners produced more target-like patterns, suggesting early stages of L2 category formation. Findings support the SLM’s predictions and underscore the need for explicit instruction on VOT contrasts and improvements in AI-assisted pronunciation feedback tools. The study concludes with some pedagogical implications for pronunciation instruction. For example, teachers working with Arabic-speaking learners should highlight the role of aspiration in English voicing contrasts and explicitly address its absence in /sp/ clusters.
Aldamen et al. (Wed,) studied this question.