Key points are not available for this paper at this time.
Large language models (LLMs) have made significant strides in reasoning capabilities, with ongoing efforts to refine their reasoning through self-correction. However, recent studies suggest that self-correction can be limited or even counterproductive without external accurate knowledge, raising questions about the limits and effectiveness of self-correction. In this paper, we aim to enhance LLM's self-checking capabilities by meticulously designing training data, thereby improving the accuracy of self-correction. We conduct a detailed analysis of error types in mathematical reasoning and develop a tailored prompt, termed ``Step CoT Check''. Then we construct a checking-correction dataset for training models. After integrating the original CoT data and checking-correction data for training, we observe that models could improve their self-checking capabilities, thereby enhancing their self-correction capacity and eliminating the need for external feedback or ground truth labels to ascertain the endpoint of correction. We compare the performance of models fine-tuned with the ``Step CoT Check'' prompt against those refined using other promps within the context of checking-correction data. The ``Step CoT Check'' outperforms the other two check formats in model with lager parameters, providing more precise feedback thus achieving a higher rate of correctness. For reproducibility, all the datasets and codes are provided in https: //github. com/bammt/Learn-to-check.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang Che
Zhenyang Xiao
Chengcheng Han
Building similarity graph...
Analyzing shared references across papers
Loading...
Che et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68e786ffb6db6435876f9c1a — DOI: https://doi.org/10.48550/arxiv.2402.13035