I present an annotated dataset of 283 poems by Federico García Lorca spanning nine major works (1921-1940), enriched with publication metadata, structural features, and GPT-4-generated thematic and contextual annotations. We describe the construction pipeline from EPUB extraction to synthetic annotation and bibliographic resolution and demonstrate its application to evaluating the generalization capacity of large language models (LLMs) in the domain of literary style transfer. As a case study, we fine-tune Llama 3 8B on this corpus and evaluate its ability to generate original poetry that reflects Lorca's distinctive stylistic patterns. Our results suggest that even sub-10B parameter models can internalize non-trivial aspects of a specific author's voice from fewer than 300 training examples, opening questions about the nature of stylistic representation in neural language models.
Xavier Vinaixa Roselló (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: