What question did this study set out to answer?

This work examines the effectiveness of GPTZero in detecting AI-generated writing and its implications for higher education.

June 4, 2026Open Access

GPTZero and the challenges of AI detection in assessing writing

Key Points

This work examines the effectiveness of GPTZero in detecting AI-generated writing and its implications for higher education.
Analyzed the performance of GPTZero in detecting human versus AI-written texts
Assessed false-negative rates and classification errors in multilingual contexts
Explored the impact on L2 writers and those in developing countries.
GPTZero exhibited substantial false-negative rates in classifying AI-generated essays.
Misclassification occurred frequently, particularly affecting non-native English speakers.
The findings suggest GPTZero is unsuitable for high-stakes writing assessment, highlighting the need for alternative approaches.

Abstract

GPTZero is an AI detection platform that scans written text for statistical signatures of machine generation and returns a probability score estimating whether it was produced by a human or an AI. In higher education, many teachers have turned to AI detection as a first-line response to the integrity crisis triggered by large language models. However, empirical findings on GPTZero’s efficacy are notably mixed. Some studies report strong diagnostic value under controlled conditions, while others document substantial false-negative rates, near-random performance on certain AI-generated essays, and frequent misclassification of AI-translated texts across several languages. Multilingual and L2 writers often bear the greatest cost, as their carefully constructed English is sometimes assigned high AI-likelihood scores because their linguistic profiles may appear less natural to models trained predominantly on standard or formulaic patterns of written English. In developing countries, where students commonly write in English as a second or third language, these limitations represent more than minor technical issues; they raise concerns about equity, potentially placing disproportionate burdens on writers working to meet academic language expectations. This article argues that GPTZero is unsuitable as a definitive tool for high-stakes assessment of writing. Instead, it proposes a shift toward postplagiarism frameworks that recognize responsible AI use. Within this approach, AI detection outputs serve as formative resources for developing critical AI literacy rather than surveillance tools. Flagged content becomes a starting point for metacognitive dialogue, which supports trust-based pedagogies that emphasize student agency and intellectual accountability.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper