What question did this study set out to answer?

The survey aims to consolidate existing methods for forensic attribution of LLM-generated text using stylometric analysis.

June 11, 2026Open Access

The Machine's Hand: A Survey of Stylometry and Forensic Attribution in LLM Text

Key Points

The survey aims to consolidate existing methods for forensic attribution of LLM-generated text using stylometric analysis.
Synthesized established ideas from authorship attribution and AI-generated-text detection.
Translates concepts into a common notation for forensic use.
Recommends statistical problem-solving over fixed word-count thresholds for source attribution.
Stylometric evidence is effective in controlled domains but varies in reliability based on prompts and other factors.
Attribution credibility hinges on maintaining measurable separation among candidate sources in relevant domains.

Abstract

Stylometry offers one of the most useful lenses for studying whether generated text carries measurable traces of its source. In the setting of large language models (LLMs), those traces may include lexical, syntactic, punctuation, discourse-marker, curvature, likelihood, and feature-distribution patterns. This paper is a synthesis and methodological survey. It does not report new experiments, propose a new detector, or claim new information-theoretic theorems. Instead, it collects established ideas from authorship attribution, AI-generated-text detection, LLM-generated-text attribution, classical hypothesis testing, and recent work on prompt sensitivity and paraphrase evasion, and it translates them into a common notation for forensic use. The survey's main practical recommendation is to treat LLM source attribution as a calibrated, prompt-conditioned, pairwise statistical problem rather than as a search for a universal fingerprint. Fixed word-count thresholds should be replaced by prompt-stratified error curves, confidence intervals, and empirical estimates of the hardest source-pair separation. Stylometric evidence can be strong in controlled domains, but it is domain-bound and can be weakened by prompts, short outputs, template constraints, paraphrasing, normalization, model drift, and open-set uncertainty. The defensible conclusion is therefore conditional: attribution becomes credible only to the extent that the chosen observation pipeline preserves measurable separation among the candidate sources in the relevant domain.

The Machine's Hand: A Survey of Stylometry and Forensic Attribution in LLM Text

Key Points

Abstract

Cite This Study

Also Consider

Also Consider