The global diffusion of English has fostered diverse localized varieties, among which Pakistani English (PE) continues to evolve within its multilingual and socially stratified environment. While prior research has investigated various phonological features of PE, limited attention has been paid to diphthongs—complex vowel units susceptible to sociophonetic variation. This study examines gender-based differences in the production and perception of English diphthongs among 40 non-native Pakistani English speakers (20 males, 20 females), selected through purposive stratified sampling. Adopting a mixed-methods approach within a descriptive-explanatory framework, participants completed structured production and perception tasks. Acoustic analysis, conducted using Praat software, measured formant frequencies (F1, F2), duration, and intensity. Inferential statistical tests (Shapiro–Wilk, independent-samples t-tests, and Mann–Whitney U tests) confirmed significant gender-based variation. Female speakers demonstrated longer vowel durations (p < 0.01), more dynamic formant movements, and higher spectral clarity, aligning with prior studies associating female speech with highly precised articulation and prestige-oriented linguistic behavior. In perception tasks, females exhibited significantly greater accuracy and phonological sensitivity (p < 0.05, d = 0.62), whereas males displayed relatively lower discrimination across diphthong contrasts. These findings demonstrate that gender has a significant impact on both the articulatory and perceptual dimensions of diphthong use in PE. The study offers pedagogical implications for gender-aware L2 instruction and contributes to ongoing sociophonetic research on localized English varieties. Future directions include exploring intersectional factors such as education and longitudinal change.
Rashid et al. (Sat,) studied this question.