What does this research mean for the field?

Fine-tuning on fewer than 300 poems by Federico García Lorca allows sub-10B parameter language models to internalize and generate poetry that reflects Lorca's distinctive stylistic patterns. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.ESTABLISHES_NEW_DIRECTION.

What question did this study set out to answer?

This research aims to assess how well large language models can learn and reproduce the unique style of Federico García Lorca's poetry.

March 14, 2026Open Access

Evaluating LLM Generalization of Literary Style: Fine-tuning on Federico García Lorca's Poetry

Puntos clave

This research aims to assess how well large language models can learn and reproduce the unique style of Federico García Lorca's poetry.
Created an annotated dataset of 283 poems by Federico García Lorca.
Utilized GPT-4 for thematic and contextual annotations.
Fine-tuned Llama 3 8B on the compiled poetry dataset.
Evaluated Llama 3's ability to generate original poetry reflecting Lorca's style.
Smaller models (<10B parameters) can effectively learn Lorca's stylistic patterns from fewer than 300 poems.
Generated poetry displays distinct elements of Lorca's voice, indicating successful style transfer.
Research prompts further exploration into the representation of literary style in neural models.

Resumen

I present an annotated dataset of 283 poems by Federico García Lorca spanning nine major works (1921-1940), enriched with publication metadata, structural features, and GPT-4-generated thematic and contextual annotations. We describe the construction pipeline from EPUB extraction to synthetic annotation and bibliographic resolution and demonstrate its application to evaluating the generalization capacity of large language models (LLMs) in the domain of literary style transfer. As a case study, we fine-tune Llama 3 8B on this corpus and evaluate its ability to generate original poetry that reflects Lorca's distinctive stylistic patterns. Our results suggest that even sub-10B parameter models can internalize non-trivial aspects of a specific author's voice from fewer than 300 training examples, opening questions about the nature of stylistic representation in neural language models.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Xavier Vinaixa Roselló (Thu,) studied this question.

synapsesocial.com/papers/69b4fc0eb39f7826a300cb27 https://doi.org/https://doi.org/10.5281/zenodo.18975628

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Me gusta

Guardar

Ver artículo completo