The study proposes and constructs a dataset of fable stories designed for instruction fine-tuning of large language models, aiming to provide high-quality training data through the construction of multi-source thematic collections and a stage-wise generation strategy. The dataset collection begins with the construction of a thematic library based on three dimensions: classic fables, cultural allusions, and modern issues, encompassing 300 core themes, ensuring data diversity and the broad cultural coverage. Subsequently, a five-step progressive generation strategy is employed, including character and background generation, conflict construction, solution design, conclusion derivation, and story synthesis, ensuring the logical coherence, completeness, and thematic consistency of the generated stories. During the generation process, the GLM-4 model is used for content creation, with model parameters fine-tuned to improve generation quality. The texts are also subjected to multi-dimensional quality evaluation using large language model API, through which high-quality stories that meet the predefined standards are filtered. This dataset has broad application potential, providing high-quality textual resources for the fine-tuning of large language models, and holds significant research value, particularly in exploring the narrative logic and cultural adaptability of large language models in text generation.
WAN et al. (Sun,) studied this question.