What question did this study set out to answer?

The aim is to create a robust dataset for training and evaluating large language models in mathematical reasoning.

March 29, 2026Open Access

A dataset of challenging mathematical problems for reasoning large language models: SD1K

Key Points

The aim is to create a robust dataset for training and evaluating large language models in mathematical reasoning.
Constructed SD1K dataset from 7,500 training samples of Sky-T1-32B-Preview model.
Extracted 9,054 mathematical concept entities with definitions and applications.
Generated new mathematical problems using random sampling of these entities.
Applied filtering with predefined rules and model-based verification to ensure quality.
Selected 1,000 high-quality and challenging mathematical problems.
Dataset supports the development of large language model performance in reasoning tasks.

Abstract

To address the lack of insufficient challenging training data for reasoning large language models, this study constructs a challenging mathematics dataset for reasoning, termed SD1K, using a methodology that integrates concept extraction and model synthesis. The dataset is built upon a collection of 7,500 training samples from the Sky-T1-32B-Preview model, from which 9,054 mathematical concept entities are extracted. Each concept entity includes a formal definition, application scenarios, and usage example. Using large language models, new mathematical problems are generated by randomly sampling these entities, with long-form reasoning chains and corresponding answers automatically constructed for each problem. A filtering process combining predefined rules and model-based verification is applied to select 1,000 high-quality and challenging mathematical problems, forming the final SD1K dataset. The SD1K dataset provides a valuable resource for the training and evaluation of large language models in reasoning tasks and contributes to the advancement of model performance in complex inference scenarios.

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper