Open-source large language models (LLMs) are increasingly explored in educational contexts due to their transparency, adaptability, and alignment with institutional governance and equity considerations. Despite growing interest, empirical research on how open-source LLMs are deployed in education and what evidence currently supports their integration remains limited and fragmented. This paper presents a state-of-the-art narrative review of peer-reviewed, human empirical studies examining the use of open-source LLMs in education. Guided by three questions, the review synthesizes how open-source LLMs are deployed across instructional contexts, what learner-related evidence is reported, and how teachers engage in human–AI collaboration. The reviewed literature is concentrated in higher education, particularly within computer science and programming domains, with applications focused on post-class tutoring, guidance, and formative feedback. Learner perceptions are generally positive, but evidence linking open-source LLM use to measurable learning outcomes remains emerging and inconsistent. Through interpretive synthesis, the review articulates a four-role model—Designer, Facilitator, Monitor, and Evaluator—that captures how teacher agency is enacted across AI-supported instructional workflows. This review maps recurring orchestration dimensions, decision points, and tensions that characterize early implementations, and it proposes a minimal orchestration reporting scaffold (configuration, boundaries, logging, adjudication) intended to support auditability and cross-study comparison as the empirical base develops.
Lin et al. (Fri,) studied this question.