In the realm of mathematics education, student misconceptions often lurk as hidden barriers, impeding true understanding and progress. This study introduces a novel hybrid framework that harnesses the power of large language models (LLMs) to classify these errors with unprecedented accuracy. By fine-tuning specialized models Gemma-2, DeepSeek, and Gemma-3 on a dataset of student responses to fraction and probability problems, we address challenges like noisy explanations and class imbalance (83.7% unlabeled errors). Our weighted voting ensemble elevates performance, achieving a Mean Average Precision at 3 (MAP@3) of 0.68, surpassing individual models by 4-10%. Detailed analyses of data characteristics, including short explanation lengths (mean 56.57 characters) and category distributions (35.7% correct, 16.3% misconceptions), reveal systemic issues in student reasoning, such as scale errors in probability. Visualizations and metrics underscore the framework’s robustness, offering educators actionable insights for personalized interventions. This approach not only advances AI-driven educational tools but also paves the way for scalable, real-time misconception detection, transforming mathematics learning into a more intuitive and effective journey.
Alanazi et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: