Chain-of-thought distillation (CoT-distillation) aims to endow small language models (SLMs) with reasoning ability to improve their performance toward specific tasks by allowing them to imitate the reasoning procedure of large language models (LLMs) beyond simply predicting the answers. Most existing CoT-distillation methods adopt a pre-thinking mechanism, allowing the SLM to generate a rationale before answering. In this way, pre-thinking enables SLM to analyze questions but makes answer correctness sensitive to minor errors in rationale. Therefore, we propose a robust post-thinking mechanism to generate answers before the rationale. Thanks to this answer-first setting: 1) the answer can escape from the rationale-sensitive problem; 2) the rationale serves as an error amplifier, making SLM focus on learning hard samples; and 3) the inferring efficiency can also benefit. Although post-thinking brings many advantages, it may lose the ability to analyze complex questions compared to pre-thinking. Therefore, a plug-and-play adaptive-thinking mechanism is proposed to integrate the merits of pre-thinking and post-thinking, in which a perception module based on soft prompt tuning is introduced to prompt SLM to answer or think first according to the complexity of questions. Extensive experiments are conducted across 12 datasets and 2 language models to demonstrate the effectiveness of the proposed mechanism.
Chen et al. (Wed,) studied this question.