Key points are not available for this paper at this time.
Authorship attribution is the task of identifying the most likely author of a questioned document from a set of candidate authors, where each candidate is represented by a writing sample. A wide range of quantitative methods for inferring authorship have been developed in stylometry, but the rise of Large Language Models (LLMs) offers new opportunities in this field. In this paper, we introduce a technique for authorship attribution based on fine-tuned LLMs. Our approach involves first further pretraining LLMs for each candidate author based on their known writings and then assigning the questioned document to the author whose Authorial Language Model (ALM) finds the questioned document most predictable, measured as the perplexity of the questioned document. We find that our approach meets or exceeds the current state-of-the-art on several standard benchmarking datasets. In addition, we show how our approach can be used to measure the predictability of each word in a questioned document for a given candidate ALM, allowing the linguistic patterns that drive our attributions to be inspected directly. Finally, we analyze what types of words generally drive successful attributions, finding that content words classes are characterized by a higher density of authorship information than function word classes, challenging a long-standing assumption of stylometry.
Building similarity graph...
Analyzing shared references across papers
Loading...
Weihang Huang
Akira Murakami
Jack Grieve
PLoS ONE
University of Birmingham
West Midlands Police
Building similarity graph...
Analyzing shared references across papers
Loading...
Huang et al. (Thu,) studied this question.
www.synapsesocial.com/papers/6a07fa7b217278811afe10f7 — DOI: https://doi.org/10.1371/journal.pone.0327081
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: