Qualitative data provides rich insight into student and educator thinking but remains difficult to analyse systematically at scale. The rise of generative artificial intelligence (GenAI) introduces new opportunities for pattern recognition and interpretive support, while also raising questions about how such systems can be responsibly embedded in educational research workflows. This study investigates the application of a fine-tuned GenAI model to classify metacognitive elements in student self-reflections and examines the methodological and epistemological implications of this process. In partnership with a secondary school, a university, and a state education department, the study analysed more than 14,000 student reflection artefacts collected between 2018 and 2023. A total of 4,631 samples were manually coded for four sub-elements of metacognition: Goal Setting, Strategy Choice, Reflection on Learning, and Effort Regulation, which were then used to train and evaluate a fine-tuned GPT-4o-mini model that achieved 80.98% classification accuracy. However, our analysis also raises critical questions. While the model could detect the existence of metacognitive constructs, it lacked the pedagogical and contextual grounding to assess their quality or relevance and required significant human effort. These findings highlight the need to reconceptualise GenAI not as a replacement for human judgement, but as a supportive analytic partner. We argue that co-design processes between educators, researchers, and developers are essential to ensure AI systems are trustworthy, theoretically grounded, and practically useful. The approach outlined in this study provides a roadmap for extending the use of GenAI to other complex educational constructs, ensuring that AI is framed not merely as a tool but as a supportive analytic partner whose design reflects the needs and values of all system actors.
Marrone et al. (Sun,) studied this question.