Transformer architectures dominate modern language modeling but incur O (N²) computational and memory costs with respect to sequence length, limiting scalability. We introduce LaminarNet, a linear-time architecture that replaces quadratic self-attention with two key mechanisms: Geometric Drift Field (GDF) for selective state propagation and Cross-Stratum Routing (CSR) for multi-scale hierarchical token interaction. LaminarNet achieves significant improvements in efficiency and performance. In a parameter-matched benchmark (49M parameters), it reduces perplexity by 38. 3%, increases throughput by 63. 3%, and reduces VRAM usage by 38. 8% compared to Transformers. Additionally, a 437M-parameter model trained on 9. 56B Turkish tokens demonstrates strong scaling behavior, achieving a perplexity of 11. 38. Code: https: //github. com/Uunan/LaminarNet Model: https: //huggingface. co/Uunan/LaminarNet₄37MTurkishBase
Building similarity graph...
Analyzing shared references across papers
Loading...
Ugurhan Colak
Building similarity graph...
Analyzing shared references across papers
Loading...
Ugurhan Colak (Wed,) studied this question.
www.synapsesocial.com/papers/69be38da6e48c4981c6798c7 — DOI: https://doi.org/10.5281/zenodo.19098613