What does this research mean for the field?

Reverse Markov Chains provide a stable and causal method for interpreting the outputs of large language models by addressing the black-box problem. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to improve the interpretability of large language models by addressing the black-box problem.

March 8, 2026Open Access

Attributive Reasoning for interpreting Large Language Models using Reverse Markov Chains

Key Points

The aim is to improve the interpretability of large language models by addressing the black-box problem.
Introduced Reverse Markov Chains (RMC) for attribution analysis.
Integrated Integrated Gradients for local sensitivity measurements.
Employed L3-Shapley values for evaluating coalitional causality.
Applied reverse posterior weighting for trajectory plausibility analysis.
Reverse posterior weighting enhances the stability of attribution across similar output-generating trajectories.
Achieved theoretical guarantees based on axioms of Integrated Gradients and L3-Shapley values.

Abstract

We address the ”black-box problem” in LLMs by tracing outputs to the behavior of theirinternal states in a way that is stable, causal, and trajectory-aware.1 Existing attribution methods (IG, SHAP, attention weights) analyze single forward passes, ignore trajectory multiplicity,lack stability under variation, and lack reverse probabilistic admissibility. We introduce ReverseMarkov Chains (RMC), a post-hoc framework that integrates Integrated Gradients (local sensitivity), L3-Shapley values (coalitional causality), and reverse posterior weighting (trajectoryplausibility). We show that reverse posterior weighting stabilizes attribution across multiple forward trajectories that yield identical outputs. Theoretical guarantees follow from axiomatic IGsensitivity and L3-Shapley admissibility under an SCM approximation.

Attributive Reasoning for interpreting Large Language Models using Reverse Markov Chains

Key Points

Abstract

Cite This Study