July 3, 2025Open Access

Attributing authorship via the perplexity of authorial language models

Key Points

Key points are not available for this paper at this time.

Abstract

Authorship attribution is the task of identifying the most likely author of a questioned document from a set of candidate authors, where each candidate is represented by a writing sample. A wide range of quantitative methods for inferring authorship have been developed in stylometry, but the rise of Large Language Models (LLMs) offers new opportunities in this field. In this paper, we introduce a technique for authorship attribution based on fine-tuned LLMs. Our approach involves first further pretraining LLMs for each candidate author based on their known writings and then assigning the questioned document to the author whose Authorial Language Model (ALM) finds the questioned document most predictable, measured as the perplexity of the questioned document. We find that our approach meets or exceeds the current state-of-the-art on several standard benchmarking datasets. In addition, we show how our approach can be used to measure the predictability of each word in a questioned document for a given candidate ALM, allowing the linguistic patterns that drive our attributions to be inspected directly. Finally, we analyze what types of words generally drive successful attributions, finding that content words classes are characterized by a higher density of authorship information than function word classes, challenging a long-standing assumption of stylometry.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Weihang Huang

Akira Murakami

Jack Grieve

Journals

PLoS ONE

Actions

Institutions

University of Birmingham

West Midlands Police

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Attributing authorship via the perplexity of authorial language models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider