We present the Waveformer, a 232M-parameter continuous-time language model that replaces the O(N²) softmax attention mechanism with Kuramoto oscillator phase synchronization. Tokens do not attend to each other; they perturb a fixed-size ensemble of coupled oscillators whose synchronization dynamics perform the computation. Position is encoded as an irrational KAM frequency, requiring zero learned parameters. The attention matrix is eliminated via sine-difference mean-field factorization of the Kuramoto coupling term. The macro-architecture is structurally pre-validated using our previously published cross-layer phase coherence analysis (DOI: 10.5281/zenodo.20720827), which identifies universal constructive pockets and conflict anchors across four model scales from 124M to 7B parameters. We pre-train a 207M-parameter variant from scratch for 50,000 steps using a phased developmental curriculum over 184K curated multi-domain entries, achieving perplexity 2.71. Instruction tuning via ChatML masked fine-tuning for 27,000 steps produces a conversational assistant with monotonically decreasing validation loss (2.244 to 2.180) and domain-routed generation behavior. A refined 232M architecture replaces all Q/K/V projections with raw oscillator attention and uses an algebraically reversible Asynchronous Leapfrog integrator for the backward pass, storing only final phase states and reconstructing the trajectory on-the-fly. Forward+backward VRAM is approximately 850MB across sequence lengths 32-512 tokens on an 8GB NVIDIA RTX 3050. A fused CUDA kernel accelerates the mean-field coupling computation.
Nossair Bajddi (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: