This study explores the potential and limitations of GPT-4o, a multimodal generative AI equipped with real-time voice interaction, in generating responses to Korean speaking proficiency test tasks. To this end, the study collected voice-based responses to six TOPIK speaking tasks generated by GPT-4o and analyzed them both quantitatively and qualitatively based on the evaluations of 18 Korean language education experts. The results indicate that GPT-4o demonstrates high performance in grammatical accuracy and logical structure but reveals clear limitations in prosodic features such as intonation, speech rate, and pragmatic appropriateness. In particular, the use of overly advanced vocabulary and unnatural delivery was repeatedly pointed out as misaligned with actual learner proficiency levels. Nevertheless, the study found that GPT-4o holds potential as a supplementary tool for speaking test preparation, especially for modeling ideal responses, providing self-directed feedback, and enabling repeated practice in autonomous learning contexts. This research underscores the applicability of multimodal AI in Korean language education and suggests the need for future studies that compare various multimodal models and examine response patterns under more complex input conditions combining visual, contextual, and interactive elements.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jiyeong Jang
Sooyeon Park
The Korean Association of General Education
Building similarity graph...
Analyzing shared references across papers
Loading...
Jang et al. (Sun,) studied this question.
www.synapsesocial.com/papers/68c193de9b7b07f3a0617918 — DOI: https://doi.org/10.46392/kjge.2025.19.4.205