What question did this study set out to answer?

This work aims to enhance the transparency and reliability of large language models through a new reasoning architecture.

February 11, 2026Open Access

The Interpretability Driven Reasoning Architecture (IDRA) - A Runtime Framework for Calibrated, Coherent, and Self Correcting Large Language Models

Key Points

This work aims to enhance the transparency and reliability of large language models through a new reasoning architecture.
Introduced the Interpretability-Driven Reasoning Architecture (IDRA) as a runtime system.
Implemented the Recursive State Vector Engine (RSVE) for dynamic feedback during generation.
Developed a ten-test suite to assess IDRA’s performance across various reasoning tasks.
IDRA maintains calibrated confidence levels and penalizes high-confidence errors.
The architecture enforces constraints on confidence increases based on evidence.
Consistent reduction in logical entropy was observed across diverse tasks.

Abstract

Large Language Models (LLMs) have achieved extraordinary generative and reasoning capabilities, yet their internal decision‑making processes remain opaque. Most interpretability methods are post‑hoc, attempting to explain a model’s output only after the forward pass has completed. This reactive paradigm leaves a persistent alignment gap: models can be confidently wrong, logically inconsistent, or sycophantic without any internal mechanism for self‑correction. This work introduces the Interpretability‑Driven Reasoning Architecture (IDRA), a runtime system that transforms reasoning from an implicit by‑product of token prediction into an explicit, recursively updated state vector. IDRA implements the Recursive State Vector Engine (RSVE), a feedback‑driven mechanism that regulates confidence, alignment, compute allocation, and logical coherence during generation. The architecture incorporates a Logic Lattice for structural consistency, a Verification Gate for evidence‑gated confidence updates, and a Miscalibration Penalty that discourages unwarranted certainty. We evaluate IDRA using a ten‑test suite spanning ambiguity, factual contradiction, safety‑critical reasoning, philosophical analysis, and long‑term preference extraction. Empirical results from the IDRALOG dataset show that IDRA maintains calibrated confidence, triggers penalties for high‑confidence errors, enforces evidentiary constraints on confidence increases, and dynamically allocates compute to reduce logical entropy. These behaviors emerge consistently across tasks, suggesting that runtime interpretability architectures can meaningfully improve transparency, safety, and reliability in LLM reasoning.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Elliot Monteverde

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

The Interpretability Driven Reasoning Architecture (IDRA) - A Runtime Framework for Calibrated, Coherent, and Self Correcting Large Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study