What question did this study set out to answer?

The aim is to evaluate the explainability of Chains of Thought in large language models, focusing on their reasoning capabilities.

March 26, 2026Open Access

Testing Explainability of Chain of Thought for Large Language Models

Key Points

The aim is to evaluate the explainability of Chains of Thought in large language models, focusing on their reasoning capabilities.
Proposed an automated approach to test responses to self-cited evidence in Chains of Thought under context intervention.
Intervened in reasoning chains by altering input context.
Measured behavioral consistency as a proxy for CoT faithfulness.
Conducted tests on mainstream open-source large language models using multi-hop question-answering tasks.
Experimental results indicate that Chains of Thought are insufficient for complete explanation.
Findings show that the reasoning provided by CoTs is unnecessary in some contexts.
The ability of CoTs to explain model behavior is limited.

Abstract

Large Language Models (LLMs) have demonstrated superior abilities in complex tasks such as text generation, reasoning, and question answering. However, the explainability of LLMs becomes weak as the parameters and complexity of LLMs increase. Chains of Thought (CoTs) guide the model to perform step-by-step reasoning and effectively enhance its reasoning ability. The multi-step rationales verbalized in a CoT are widely regarded as the explanation of the model itself. This paper proposes an automated approach to testing the behavioral sensitivity of responses to self-cited evidence in CoTs from sufficiency and necessity perspectives under context intervention. Specifically, we intervene in the reasoning chain by changing the input context and measure the behavioral consistency as a proxy for the faithfulness of the CoT. We test the CoT rationales of mainstream open-source LLMs on multi-hop question-answering tasks. The experimental results show that the self-stated reasoning chain is insufficient and unnecessary. The CoT cannot fully explain the behavior of LLMs.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper

Cite This Study

Chen et al. (Tue,) studied this question.

synapsesocial.com/papers/69c4cd3efdc3bde44891946a https://doi.org/https://doi.org/10.3390/app16073112

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

AI에게 질문

Bookmark

View Full Paper