The development of large language models (LLMs) has made the generation of AI text nearly replicating human writing available to the public. This poses severe problems for academic honesty, the verification of information, and the authentication of documents. In this paper, we present a novel approach based on deep learning to tackle the problem of human vs. AI text detection. We have developed DETECTRIX, a hybrid transformer-based framework that combines optimized preprocessing with domain-adaptive training methodologies. Our approach has analyzed textual context, linguistic features, and statistical writing patterns to distinguish between human-authored and AI-generated content with high precision. Evaluation of a large dataset of academic writings, news articles, and creative writing pieces demonstrates that our model outperforms existing methods, achieving an F1-score of 97.8%. We also examine the enduring shortcomings of current detection approaches and identify directions for further investigations, considering evolving generative AI capabilities. This work contributes to maintaining authenticity in the face of sophisticated text generation tools.
Elwan et al. (Wed,) studied this question.