We present a neural architecture for semantic representation grounded in the physical acoustics of speech production rather than statistical co-occurrence. The Receiver Model processes verbal roots by converting their sequential acoustic formant trajectories (F1, F2 frequencies) into heterogeneous continuous-time Ordinary Differential Equation (ODE) dynamics. No global error signal, backpropagation, or word-level semantic labels are used at any stage. We prove formally that (i) the resonance state update of the network is structurally identical to Mamba's selective state-space recurrence via zero-order-hold discretization; (ii) under local unconstrained dynamics, the weight matrix maximizes an explicit structural objective; and (iii) the harmonic coherence metric H over the phonosemantic manifold M is a proper pseudometric satisfying the triangle inequality. Empirically, we report that continuous-time sequential encoding achieves ARI = 0. 069 on the 150-root Paninian benchmark against independently-derived phenomenological axis labels, providing a +61% performance leap over static embedding geometries. We meticulously characterize the representational capacity ceiling of single-layer architectures, demonstrating a stable 0. 06 ARI barrier across exhaustive structural and Group Relative Policy Optimization (GRPO) sweeps. This defines the first empirical boundary for cross-locus semantic abstraction in zero-weight phonological reservoirs, providing formal justification that deep categorical meaning requires secondary hierarchical layer processing. The architecture achieves O (1) context memory and O (L) time, with physical, interpretable state coordinates.
Amit Kumar (Thu,) studied this question.