What question did this study set out to answer?

The aim is to improve Large Language Models by decoupling model depth from parameter count using recurrent architectures.

April 24, 2026Open Access

Openmythos: Breaking the Parameter-Depth Trade-Off via Lti-Stable Recurrent Trans-Formers

Key Points

The aim is to improve Large Language Models by decoupling model depth from parameter count using recurrent architectures.
Developed a Recurrent-Depth Transformer with iterative looping of a transformer block.
Introduced an LTI-stable injection mechanism to ensure numerical stability during deep recurrence.
Implemented Adaptive Computation Time for dynamic model halting and optimized loop-index embedding.
Validation loss improved from 1.691 to 1.641 compared to static baseline.
Demonstrated enhanced generalization capabilities through learnable depth embedding.
Achieved faster model convergence in experimental evaluations.

Abstract

Scaling the reasoning capabilities of Large Language Models typically requires increasing model depth, which imposes a linear penalty on parameter count and memory usage. In this paper, we introduce OpenMythos, a Recurrent-Depth Transformer architecture that decouples computational depth from parameter overhead by iteratively looping a single transformer block. To overcome the numerical instability and signal drift inherent in deep recurrence, we propose a Linear Time-Invariant (LTI) stable injection mechanism that guarantees a spectral radius ρ (A) < 1, ensuring stability across arbitrary loop depths. We further optimize the architecture using Adaptive Computation Time (ACT) for dynamic halting and a hybrid learnableharmonic loop-index embedding to distinguish between iterative refinement stages. Our experiments on the shakespearechar dataset demonstrate that the learnable depth embedding significantly enhances generalization, reducing the best validation loss from 1. 691 in the static baseline to 1. 641 while accelerating convergence. These results suggest that recurrent architectures can effectively emulate the depth of traditional transformers, providing a scalable path toward deep reasoning with a compact parameter footprint.

Openmythos: Breaking the Parameter-Depth Trade-Off via Lti-Stable Recurrent Trans-Formers

Key Points

Abstract

Cite This Study