To address the lack of insufficient challenging training data for reasoning large language models, this study constructs a challenging mathematics dataset for reasoning, termed SD1K, using a methodology that integrates concept extraction and model synthesis. The dataset is built upon a collection of 7,500 training samples from the Sky-T1-32B-Preview model, from which 9,054 mathematical concept entities are extracted. Each concept entity includes a formal definition, application scenarios, and usage example. Using large language models, new mathematical problems are generated by randomly sampling these entities, with long-form reasoning chains and corresponding answers automatically constructed for each problem. A filtering process combining predefined rules and model-based verification is applied to select 1,000 high-quality and challenging mathematical problems, forming the final SD1K dataset. The SD1K dataset provides a valuable resource for the training and evaluation of large language models in reasoning tasks and contributes to the advancement of model performance in complex inference scenarios.
Zhu et al. (Sun,) studied this question.