Linking assessment items to knowledge components (KCs) is essential for adaptive learning and cognitive diagnosis but remains a labor-intensive and expert-dependent process. This study investigates the use of Large Language Models (LLM) to automate item–KC association through Q-matrix generation as an interpretable representation of item–attribute mappings. In contrast to prior studies that focus solely on tagging accuracy or linguistic classification, we propose a dual-level computational evaluation framework that jointly assesses (i) structural alignment with expert-defined Q-matrices using formal similarity and discrepancy metrics, and (ii) functional impact on predictive performance within the DINA cognitive diagnostic model using RMSE and MAE under controlled experimental settings. The results show that few-shot prompting substantially improves structural alignment compared with zero-shot configurations, whereas chain-of-thought provides marginal refinements. Despite structural variations across prompting strategies, the predictive performance remains stable and comparable to expert-defined Q-matrices, indicating the functional robustness of the diagnostic model. The main contribution of this study lies in providing an integrated structural–functional evaluation protocol for LLM-based knowledge tagging, offering empirical evidence that automated item–KC linking can support scalable assessment design without degrading the diagnostic accuracy.
Lopes et al. (Thu,) studied this question.