Abstract Recently, Large Language Models (LLMs) have been able to generate text that closely resembles human writing, raising concerns about academic misuse and misinformation. Existing detection approaches often depend on a single type of feature, require direct access to the underlying models, and are sensitive to variations in text length and paraphrasing. To address these issues, this paper proposes a Multi-Feature Accurate Detection (MFAD) approach that integrates handcrafted statistical and syntactic features with deep semantic features based on Global Vectors for Word Representation (GloVe) embeddings, Convolutional Neural Networks (CNNs), and Bidirectional Long Short-Term Memory (BiLSTM). The results of the experiments on Human ChatGPT Comparison Corpus (HC3) demonstrate that MFAD achieves 98% accuracy, 96.5% precision, 97.5% recall, 97% F1-score, with a minimum False Positive Rate (FPR) of 0.01 across multiple domains. Additionally, MFAD demonstrates strong cross-model generalizability across LLMs such as GPT-4, Gemini, and Claude-4, and exhibits resilience to text length variations and paraphrasing.
Mostafa et al. (Mon,) studied this question.