This paper introduces the Correctness, Coherence, and Comprehensiveness CoT Decoding (3C) framework, which aims to enhance the performance of LLMs in complex reasoning tasks. The framework combines chain-of-thought (CoT) reasoning, retrieval, and correctness evaluation to generate high-quality reasoning steps. Additionally, 3C improves the coherence and comprehensiveness of reasoning chains through graph attention networks (GAT), Siamese networks, and multi-task optimization. Experimental results show that 3C outperforms baseline models, achieving F1 score improvements of +7.4 (73.7), +4.4 (71.1), +6.7 (43.3), +3.0 (44.7), and +1.2 (83.4) on HotpotQA, 2WikiMultiHopQA, MuSiQue, FERMI, and StrategyQA, respectively. Moreover, the average inference time per example across five datasets is 95.8s, comparable to other methods, demonstrating a balanced trade-off between accuracy and efficiency. In the 3C (13B) model, F1 scores of 67.1 and 64.3 are achieved on HotpotQA and 2WikiMultiHopQA, respectively, outperforming other models of similar size. Compared to 3C (70B), 3C (13B), with only 18.6% of the parameters, maintains 87.9% of the F1 performance across five datasets, confirming the effectiveness of 3C in smaller models and showcasing its scalability and applicability. Ablation studies further validate the critical roles of the correctness, coherence, and comprehensiveness modules in improving performance. In conclusion, 3C provides an efficient and scalable solution, significantly improving LLM performance in complex reasoning tasks while achieving notable advancements in both accuracy and efficiency.
Building similarity graph...
Analyzing shared references across papers
Loading...
Guangjie Lu
Southwest University
Weixiao Zhan
University of California, San Diego
Lin Peng
Yunnan Agricultural University
Journal of King Saud University - Computer and Information Sciences
University of California, San Diego
Southwest University
Yunnan Agricultural University
Building similarity graph...
Analyzing shared references across papers
Loading...
Lu et al. (Tue,) studied this question.
synapsesocial.com/papers/69b3acc502a1e69014cceb1c — DOI: https://doi.org/10.1007/s44443-026-00615-8