We present a unified formal framework proving that AI-generated text detection is information-theoretically impossible when the model's output distribution converges to the human text distribution, absent generator cooperation (e.g., watermarking). We formalize the detection problem as a statistical hypothesis test between two distributions over text strings and prove that the optimal distinguishing advantage of any detector is exactly equal to the total variation distance between the human and model distributions (Theorem 4.1). We connect this result to the language model training objective, showing via Pinsker's inequality that as cross-entropy loss approaches the entropy of human text, the maximum achievable detection advantage provably vanishes (Theorem 4.4). We extend the impossibility to conditional (per-context) distributions (Theorem 4.8), prove that post-processing can only degrade detection via the data processing inequality (Theorem 4.5), and establish a multi-sample detection bound showing that even with n independent texts from the same source, the optimal detection advantage grows at most as √n times the per-sample divergence, so that distributional convergence still dominates (Theorem 4.10). We connect distributional convergence to empirical scaling laws and provide a systematic taxonomy mapping every major class of AI text detection to the specific theoretical result that bounds its performance. Our results establish that, under the empirically supported assumption of distributional convergence, the failure of AI text detectors is not an engineering shortcoming but a mathematical consequence of the training objective, absent generator cooperation.
David William Silva (Fri,) studied this question.