We present the Devavāṇī-Derived Interpretable Network (DDIN), a neural architecture for semantic representation grounded in the physical acoustics of speech production rather than statistical co-occurrence. The DDIN Receiver Model processes Sanskrit verbal roots by converting their acoustic formant profiles (F1, F2 frequencies derived from source-filter theory and locus equations) into heterogeneous Leaky Integrate-and-Fire (LIF) spike trains, which drive a reservoir governed solely by Bienenstock-Cooper-Munro (BCM) local plasticity. No global error signal, backpropagation, or word-level semantic labels are used at any stage. We prove formally that (i) the resonance state update of the DDIN is structurally identical to Mamba's selective state-space recurrence via zero-order-hold discretization; (ii) under BCM dynamics, the fixed-point weight matrix maximizes an explicit objective proportional to output variance minus a threshold-penalty term; (iii) the harmonic coherence metric H over the phonosemantic manifold M is a proper pseudometric satisfying the triangle inequality; and (iv) the morphophonological junction operator is an associative, surjective partial function on phoneme sequences, with a computable complexity O(1) per application. Empirically, we report that the 23-dimensional formant-first v16 embedding achieves ARI = 0.0366 on the 150-root Paninian benchmark against independently-derived phenomenological axis labels, without any optimization. We further show that formant geometry provides +14 percentage points over phoneme identity alone (linear probe, p < 0.001). These results constitute the first empirical proof that bodily-grounded phonological representations organize into semantic categories without gradient-based optimization. The architecture achieves O(1) context memory and O(L) time, with interpretable state coordinates that Mamba's latent vectors lack.
Dr. Amit Kumar (Tue,) studied this question.