The rapid advancement of large language models (LLMs) has led to a substantial increase in AI‐generated text, raising concerns regarding academic integrity, content authenticity, and misinformation. Detecting such text has become increasingly challenging due to improved fluency, paraphrasing capabilities, and domain variability. This paper proposes a multirepresentation stacked ensemble framework for AI‐generated text detection that integrates heterogeneous feature spaces, including transformer‐based contextual embeddings, contrastive semantic representations, handcrafted linguistic features, and LLM‐derived meta‐features. A mutual information–based feature selection strategy is employed to reduce redundancy and enhance generalization. The selected features are processed through a multilevel ensemble consisting of diverse base learners and a logistic regression meta‐classifier. The proposed approach is evaluated on two publicly available benchmark datasets. Experimental results demonstrate that the model achieves an accuracy of 98.74% and an AUC of 0.997 on Dataset 01, and an accuracy of 97.92% with an AUC of 0.994 on Dataset 02. Comparative analysis with multiple baseline models and an ablation study confirm that the integration of heterogeneous representations significantly improves detection performance over single‐model approaches. These findings indicate that combining semantic, statistical, and meta‐level features within a unified ensemble framework provides a robust and generalizable solution for AI‐generated text detection.
Md. Siam Ansary (Thu,) studied this question.