Grounded in the Interaction Hypothesis, this study investigates the mechanisms underlying digital human-driven collaborative English speaking learning. Addressing the persistent problem of interaction fossilization in computer-supported collaborative learning, the study argues that learners’ low willingness to communicate and insufficient collaborative scaffolding often prevent peer interaction from developing into meaningful negotiation of meaning. Against this backdrop, digital human companions—characterized by multimodal interaction and enhanced social presence—are examined as potential mediators of interactional conditions conducive to second language acquisition. Drawing on interactionist theory, the study proposes a three-dimensional closed-loop interaction mechanism consisting of Pre-adaptation, Process Synergy, and Feedback Iteration. The pre-adaptation dimension calibrates learner profiles, task design, and affective readiness to establish optimal conditions for interaction. The process synergy dimension focuses on sustaining negotiation of meaning through scaffolded interaction, clarification requests, and prompts that encourage pushed output. The feedback iteration dimension embeds non-intrusive, data-driven corrective feedback and affective regulation into ongoing interaction, enabling continuous adjustment of task difficulty and interaction strategies. Theoretically, the proposed mechanism extends interactionist explanations of language learning to multimodal, human–AI collaborative environments. Practically, it offers a structured and implementable framework for designing intelligent speaking tasks that enhance interaction depth, feedback quality, and learner engagement. The study concludes that digital human companions can function as interactional facilitators rather than mere tools, supporting the transition from superficial participation to sustained, meaning-focused collaborative speaking.
Yang et al. (Tue,) studied this question.