Large Language Models (LLMs) excel at generating content at impeccable speeds. However, they are imperfect and still make various mistakes. In Computer Science education, as LLMs are widely recognized as "AI pair programmers," it becomes increasingly important to train students on evaluating and debugging LLM-generated codes. In this work, we introduce HypoCompass, a novel system to facilitate deliberate practice on debugging, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents debug code. We enable effective task delegation between students and LLMs in this learning-by-teaching environment: students focus on hypothesizing the cause of code errors, while adjacent skills like code completion are offloaded to LLM-agents. Our evaluations demonstrate that HypoCompass generates high-quality training materials (e.g., bugs and fixes), outperforming human counterparts fourfold in efficiency, and significantly improves student performance on debugging by 12% in the pre-to-post test.
Building similarity graph...
Analyzing shared references across papers
Loading...
Qianou Ma
Hua Shen
Kenneth R. Koedinger
University of Washington
Carnegie Mellon University
Building similarity graph...
Analyzing shared references across papers
Loading...
Ma et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68d469d631b076d99fa66e29 — DOI: https://doi.org/10.24963/ijcai.2025/1217