May 13, 2024Open Access

Strategic Data Ordering: Enhancing Large Language Model Performance through Curriculum Learning

Key Points

Key points are not available for this paper at this time.

Abstract

The rapid advancement of Large Language Models (LLMs) has improved text understanding and generation but poses challenges in computational resources. This study proposes a curriculum learning-inspired, data-centric training strategy that begins with simpler tasks and progresses to more complex ones, using criteria such as prompt length, attention scores, and loss values to structure the training data. Experiments with Mistral-7B (Jiang et al., 2023) and Gemma-7B (Team et al., 2024) models demonstrate that curriculum learning slightly improves performance compared to traditional random data shuffling. Notably, we observed that sorting data based on our proposed attention criteria generally led to better performance. This approach offers a sustainable method to enhance LLM performance without increasing model size or dataset volume, addressing scalability challenges in LLM training.

AIに質問

Bookmark

View Full Paper

Cite This Study

Kim et al. (Mon,) studied this question.

synapsesocial.com/papers/68e6a745b6db64358762a3e6 https://doi.org/https://doi.org/10.48550/arxiv.2405.07490

AIに質問

Bookmark

View Full Paper