The rapid proliferation of open-source Large Language Models (LLMs), including ChatGPT, Gemini, and others, has revolutionized research and educational practices while simultaneously introducing unprecedented challenges to academic integrity. The increasing misuse of these models to generate fraudulent student essays that exhibit sophisticated authorship deception necessitates the development of robust detection mechanisms. Despite growing concerns, existing literature lacks timely solutions for identifying AI-generated academic content, particularly in non-English contexts such as Arabic, where linguistic complexities and limited resources compound the challenge. This study addresses this critical gap by fine-tuning LLMs to detect AI-generated student essays in Arabic educational settings. We introduce three novel datasets specifically designed to capture diverse AI-generation scenarios in academic writing. Our methodology employs CAMeLBERT-based models, fine-tuned for binary classification tasks that distinguish between human-authored and AI-generated essays. Experimental results demonstrate high performance across all three datasets, achieving an average accuracy of 95.5%, which validates both the effectiveness of our approach and its adaptability to various detection scenarios. The contributions of this work are: (1) we present a comprehensive framework for detecting AI-generated Arabic student essays, (2) we create and publicly release three benchmark datasets to facilitate future research in this domain, and (3) we demonstrate that fine-tuned Arabic language models can achieve near-perfect detection accuracy, providing educational institutions with a practical tool for safeguarding academic integrity in the era of generative AI.
Boutadjine et al. (Thu,) studied this question.