Large Language Model (LLM) embeddings achieve strong performance in authorship attribution, yet it remains unclear which aspects of literary style they encode. We address this question through a residualization analysis of two poetry corpora: 5,800 Russian poems (29 authors) and 10,400 Italian poems spanning seven centuries (52 authors). Using a progressive residualization waterfall, we subtract interpretable stylometric features and high-dimensional lexical controls from embedding representations to quantify their contribution to attribution accuracy. For Russian poetry, residual signal collapses to near chance (1.1 times chance) after accounting for character n-grams and word bigrams, indicating that embeddings largely compress orthographic and lexical distributions already exploited in classical stylometry. For Italian poetry, a reduced but significant residual persists (4.6 times chance), consistent with diachronic or dialectal variation not fully captured by standard features. We conclude that embeddings and stylometry rely on overlapping signals but differ in how they weight lexical, semantic, and historical variation.
Maria Levchenko (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: