What question did this study set out to answer?

This research aims to create a high-quality dataset of fable stories for fine-tuning large language models.

March 18, 2026

A dataset of fable stories for instruction fine-tuning of large language models

Key Points

This research aims to create a high-quality dataset of fable stories for fine-tuning large language models.
Developed a thematic library with 300 core themes including classic fables and modern issues.
Employed a five-step generation strategy for story creation.
Utilized the GLM-4 model for content generation, with fine-tuning of model parameters.
Conducted multi-dimensional quality evaluation for filtering generated stories.
Generated stories exhibit logical coherence and thematic consistency.
The evaluated dataset meets predefined quality standards.
Potential applications include enhancing the narrative capabilities of large language models.

Abstract

The study proposes and constructs a dataset of fable stories designed for instruction fine-tuning of large language models, aiming to provide high-quality training data through the construction of multi-source thematic collections and a stage-wise generation strategy. The dataset collection begins with the construction of a thematic library based on three dimensions: classic fables, cultural allusions, and modern issues, encompassing 300 core themes, ensuring data diversity and the broad cultural coverage. Subsequently, a five-step progressive generation strategy is employed, including character and background generation, conflict construction, solution design, conclusion derivation, and story synthesis, ensuring the logical coherence, completeness, and thematic consistency of the generated stories. During the generation process, the GLM-4 model is used for content creation, with model parameters fine-tuned to improve generation quality. The texts are also subjected to multi-dimensional quality evaluation using large language model API, through which high-quality stories that meet the predefined standards are filtered. This dataset has broad application potential, providing high-quality textual resources for the fine-tuning of large language models, and holds significant research value, particularly in exploring the narrative logic and cultural adaptability of large language models in text generation.

Bookmark

Cite This Study

WAN et al. (Sun,) studied this question.

synapsesocial.com/papers/69ba43f74e9516ffd37a5b86 https://doi.org/https://doi.org/10.11922/11-6035.csd.2025.0058.zh

Bookmark