Key points are not available for this paper at this time.
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate. We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data using frontier LLMs (e. g. , GPT-3. 5). Inspired by the cognitive mechanism in human mathematical learning, it first extracts topics and knowledge points from seed math questions and then build a concept graph, which is subsequently used to generate new math questions. MathScale exhibits effective scalability along the size axis of the math dataset that we generate. As a result, we create a mathematical reasoning dataset (MathScaleQA) containing two million math question-answer pairs. To evaluate mathematical reasoning abilities of LLMs comprehensively, we construct MwpBench, a benchmark of Math Word Problems, which is a collection of ten datasets (including GSM8K and MATH) covering K-12, college, and competition level math problems. We apply MathScaleQA to fine-tune open-source LLMs (e. g. , LLaMA-2 and Mistral), resulting in significantly improved capabilities in mathematical reasoning. Evaluated on MwpBench, MathScale-7B achieves state-of-the-art performance across all datasets, surpassing its best peers of equivalent size by 42. 9\% in micro average accuracy and 43. 7\% in macro average accuracy, respectively.
Building similarity graph...
Analyzing shared references across papers
Loading...
Tang et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68e75a06b6db6435876d1159 — DOI: https://doi.org/10.48550/arxiv.2403.02884
Zhengyang Tang
Xingxing Zhang
Benyou Wang
Building similarity graph...
Analyzing shared references across papers
Loading...
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: