What question did this study set out to answer?

This research aims to replace traditional attention mechanisms in language models with continuous-time phase synchronization among oscillators.

June 20, 2026Open Access

Emergence Over Attention: Continuous-Time Phase Synchronization as a Computational Primitive

Key Points

This research aims to replace traditional attention mechanisms in language models with continuous-time phase synchronization among oscillators.
Developed the Waveformer model with 232M parameters that uses Kuramoto oscillator phase synchronization instead of softmax attention.
Pre-trained a 207M-parameter variant on 184K curated entries for 50,000 steps, evaluating performance through perplexity metrics.
Utilized a CUDA kernel for accelerated computation of mean-field coupling.
Achieved a perplexity score of 2.71 with the pre-training phase and reduced validation loss from 2.244 to 2.180 after instruction tuning.
Enhanced performance enables a conversational assistant with domain-routed generation behavior, indicating efficient processing of language tasks.

Abstract

We present the Waveformer, a 232M-parameter continuous-time language model that replaces the O(N²) softmax attention mechanism with Kuramoto oscillator phase synchronization. Tokens do not attend to each other; they perturb a fixed-size ensemble of coupled oscillators whose synchronization dynamics perform the computation. Position is encoded as an irrational KAM frequency, requiring zero learned parameters. The attention matrix is eliminated via sine-difference mean-field factorization of the Kuramoto coupling term. The macro-architecture is structurally pre-validated using our previously published cross-layer phase coherence analysis (DOI: 10.5281/zenodo.20720827), which identifies universal constructive pockets and conflict anchors across four model scales from 124M to 7B parameters. We pre-train a 207M-parameter variant from scratch for 50,000 steps using a phased developmental curriculum over 184K curated multi-domain entries, achieving perplexity 2.71. Instruction tuning via ChatML masked fine-tuning for 27,000 steps produces a conversational assistant with monotonically decreasing validation loss (2.244 to 2.180) and domain-routed generation behavior. A refined 232M architecture replaces all Q/K/V projections with raw oscillator attention and uses an algebraically reversible Asynchronous Leapfrog integrator for the backward pass, storing only final phase states and reconstructing the trajectory on-the-fly. Forward+backward VRAM is approximately 850MB across sequence lengths 32-512 tokens on an 8GB NVIDIA RTX 3050. A fused CUDA kernel accelerates the mean-field coupling computation.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper