In this work, we examine approaches to fine-tuning large language models (LLMs) for the task of summarizing transcripts of Russian-language dialogues. The main hypothesis of this work suggests that sequential fine-tuning on two datasets—a general one that improves the models performance with a specific language, and a specialized one containing primarily dialogue texts—can significantly improve summarization quality compared to fine-tuning on only one of them. To test the hypothesis, we implemented a complete experimental pipeline based on the Qwen2.5-7B model using the Low-Rank Adaptation (LoRA) methodology. We utilized Russian-language datasets: RussianNLP/Mixed-Summarization-Dataset for general fine-tuning and a combined dataset SAMSum-ru + DialogSum-ru for specialized dialogue fine-tuning. A comparison of three fine-tuning scenarios was conducted using ROUGE and BERTScore metrics. The results demonstrate that a two-stage approach provides a synergistic effect: a 90% improvement in ROUGE-1 metric for dialogue texts and maintained performance on general texts. The combined approach to fine-tuning on two datasets outperforms single-dataset approaches in both universality and specialized performance.
Aleksey Suin (Thu,) studied this question.