Chain of Thought (CoT) prompting has been shown to improve the performance of large language models (LLMs) in a wide range of tasks, including arithmetic, common-sense, and symbolic reasoning. However, this improvement requires the development of effective CoT prompts. On the other hand, more recent work has shown that CoT reasoning paths are often inherently present in top-k alternative decoding sequences, even in the absence of any specific prompting technique. In this study, we propose a new fine-tuning method that exploits this property by targeting only two specific tokens of these pre-existing CoT responses. We demonstrate that fine-tuning only two tokens using the model’s own implicitly generated CoT paths leads to a significant efficiency gain, reducing training time while still achieving meaningful performance improvements. When evaluated on arithmetic datasets, we achieved a 22.7% improvement on MultiArith, 9.0% on GSM8K, and 2.3% on SVAMP when validated on the Phi-2 model from a greedy decoding perspective, reducing the processing time by over 90% compared to the LoRA fine-tuning method. Code is publicly available at: https://github.com/paulosantosneto/2tft .
Building similarity graph...
Analyzing shared references across papers
Loading...
Paulo S. Neto
Universidade Federal do Rio Grande
Jardel D. S. Dyonisio
Universidade Federal do Rio Grande
João F. S. S. Lemos
Universidade Federal do Rio Grande
Neural Computing and Applications
Universidade Federal do Rio Grande
Building similarity graph...
Analyzing shared references across papers
Loading...
Neto et al. (Tue,) studied this question.
synapsesocial.com/papers/69e9b91385696592c86ebf42 — DOI: https://doi.org/10.1007/s00521-026-12052-9