What question did this study set out to answer?

The aim is to develop a robust framework for detecting AI-generated text using integrated feature spaces.

May 28, 2026Open Access

A Multirepresentation Stacked Ensemble for AI‐Generated Text Detection

Key Points

The aim is to develop a robust framework for detecting AI-generated text using integrated feature spaces.
Developed a multirepresentation stacked ensemble framework using diverse feature spaces.
Utilized mutual information-based feature selection to enhance model generalization.
Evaluated the framework on two benchmark datasets for performance assessment.
Achieved 98.74% accuracy and an AUC of 0.997 on Dataset 01.
Achieved 97.92% accuracy and an AUC of 0.994 on Dataset 02.
Demonstrated significant performance improvements over baseline models through comparative analysis.

Abstract

The rapid advancement of large language models (LLMs) has led to a substantial increase in AI‐generated text, raising concerns regarding academic integrity, content authenticity, and misinformation. Detecting such text has become increasingly challenging due to improved fluency, paraphrasing capabilities, and domain variability. This paper proposes a multirepresentation stacked ensemble framework for AI‐generated text detection that integrates heterogeneous feature spaces, including transformer‐based contextual embeddings, contrastive semantic representations, handcrafted linguistic features, and LLM‐derived meta‐features. A mutual information–based feature selection strategy is employed to reduce redundancy and enhance generalization. The selected features are processed through a multilevel ensemble consisting of diverse base learners and a logistic regression meta‐classifier. The proposed approach is evaluated on two publicly available benchmark datasets. Experimental results demonstrate that the model achieves an accuracy of 98.74% and an AUC of 0.997 on Dataset 01, and an accuracy of 97.92% with an AUC of 0.994 on Dataset 02. Comparative analysis with multiple baseline models and an ablation study confirm that the integration of heterogeneous representations significantly improves detection performance over single‐model approaches. These findings indicate that combining semantic, statistical, and meta‐level features within a unified ensemble framework provides a robust and generalizable solution for AI‐generated text detection.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper