What type of study is this?

August 21, 2025Open Access

Distilling Reasoning Ability From Large Language Models With Adaptive Thinking

Key Points

The adaptive-thinking mechanism enhances reasoning ability in small language models, improving overall task performance.
Using 12 datasets, the answer-first approach reduced reliance on rationale accuracy and improved inference efficiency.
Observational analysis across different language models shows the effectiveness of integrating pre-thinking and post-thinking.
This approach may preserve complex question analysis while addressing rationale sensitivity in small language models.

Abstract

Chain-of-thought distillation (CoT-distillation) aims to endow small language models (SLMs) with reasoning ability to improve their performance toward specific tasks by allowing them to imitate the reasoning procedure of large language models (LLMs) beyond simply predicting the answers. Most existing CoT-distillation methods adopt a pre-thinking mechanism, allowing the SLM to generate a rationale before answering. In this way, pre-thinking enables SLM to analyze questions but makes answer correctness sensitive to minor errors in rationale. Therefore, we propose a robust post-thinking mechanism to generate answers before the rationale. Thanks to this answer-first setting: 1) the answer can escape from the rationale-sensitive problem; 2) the rationale serves as an error amplifier, making SLM focus on learning hard samples; and 3) the inferring efficiency can also benefit. Although post-thinking brings many advantages, it may lose the ability to analyze complex questions compared to pre-thinking. Therefore, a plug-and-play adaptive-thinking mechanism is proposed to integrate the merits of pre-thinking and post-thinking, in which a perception module based on soft prompt tuning is introduced to prompt SLM to answer or think first according to the complexity of questions. Extensive experiments are conducted across 12 datasets and 2 language models to demonstrate the effectiveness of the proposed mechanism.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper