This study examines phonemic differences in the McGurk effect in Mandarin Chinese, emphasizing acoustic features and audiovisual integration. Fifty-nine native Mandarin speakers completed three tasks: Basic Phoneme Selection, Tone-Consonant Separation Judgment, and Complex Articulation Contrasts. A one-way ANOVA revealed a significant main effect of task type on accuracy (F=20.251, p<0.001). Accuracy was highest in Task 1 (M=0.85), followed by Task 3 (M=0.65), with Task 2 showing the lowest accuracy (M=0.45), indicating challenges in resolving tone-consonant conflicts. Fricative/stop combinations (e.g., /f/ vs. /p/) elicited a higher fusion perception rate than pure stops (F=7.144, p=0.026), attributed to acoustic ambiguity and visual complementarity. Labiodental sounds (e.g., /f/) demonstrated significantly higher fusion rates (M=0.16, SD=0.12) than non-labiodentals (M=0.03, SD=0.04), highlighting visual salience (e.g., lip-teeth contact) in perceptual integration. Findings suggest Mandarin speakers heightened sensitivity to segmental conflicts, potentially influenced by tonal language structures. These results inform speech synthesis optimization (e.g., lip-sync enhancement) and cross-linguistic audiovisual algorithm design. Future research should integrate neuroimaging to explore neural mechanisms and dialectal impacts on multimodal processing.
Yao et al. (Wed,) studied this question.