Given the rapid advancements, and notable failures, in large language model generative AI (genAI), there are elevated expectations that retrieval-augmented generation (RAG) AI chatbots will revolutionize higher education by offering individualized, always-available tutoring based on validated content. However, experimental evidence on their effectiveness remains scarce. Using a randomized controlled field experiment, this study examines the effects of a genAI chatbot on key precursors to learning success (i.e., interest, self-efficacy, and engagement) and academic achievement for ≈500 undergraduate students across two modalities (in-person and asynchronous online). We completed a semester-long controlled experiment with pre- and post-treatment surveys and tests. Despite expectations, we found the genAI chatbot had no statistically significant impact on any measured outcome. These early results challenge assumptions about AI’s instructional effectiveness and suggest universities should further investigate the pedagogical value of AI chatbots before making substantial investments or committing to long-term contracts. We recommend future research to increase the generalizability of the findings and to discover methods to improve efficacy of AI chatbots in higher education.
Thoeni et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: