What question did this study set out to answer?

The aim is to develop a neural model for semantic representation grounded in speech acoustics instead of statistical methods.

April 16, 2026Open Access

The DDIN Receiver Model: Phonosemantically-Grounded Semantic Clustering Without Backpropagation

Puntos clave

The aim is to develop a neural model for semantic representation grounded in speech acoustics instead of statistical methods.
Developed the DDIN architecture focusing on acoustic features of Sanskrit verbal roots.
Converted formant profiles into spike trains using leaky integrate-and-fire dynamics.
Applied BCM local plasticity without backpropagation for model training.
Validated model against a benchmark of 150 Sanskrit roots using independent labels.
Achieved an Adjusted Rand Index (ARI) of 0.0366 on the Paninian benchmark without optimization.
Formant geometry improved semantic categorization by 14 percentage points compared to phoneme identity alone (p < 0.001).
Demonstrated that phonological representations can organize into semantic categories without gradient-based optimization.

Resumen

We present the Devavāṇī-Derived Interpretable Network (DDIN), a neural architecture for semantic representation grounded in the physical acoustics of speech production rather than statistical co-occurrence. The DDIN Receiver Model processes Sanskrit verbal roots by converting their acoustic formant profiles (F1, F2 frequencies derived from source-filter theory and locus equations) into heterogeneous Leaky Integrate-and-Fire (LIF) spike trains, which drive a reservoir governed solely by Bienenstock-Cooper-Munro (BCM) local plasticity. No global error signal, backpropagation, or word-level semantic labels are used at any stage. We prove formally that (i) the resonance state update of the DDIN is structurally identical to Mamba's selective state-space recurrence via zero-order-hold discretization; (ii) under BCM dynamics, the fixed-point weight matrix maximizes an explicit objective proportional to output variance minus a threshold-penalty term; (iii) the harmonic coherence metric H over the phonosemantic manifold M is a proper pseudometric satisfying the triangle inequality; and (iv) the morphophonological junction operator is an associative, surjective partial function on phoneme sequences, with a computable complexity O(1) per application. Empirically, we report that the 23-dimensional formant-first v16 embedding achieves ARI = 0.0366 on the 150-root Paninian benchmark against independently-derived phenomenological axis labels, without any optimization. We further show that formant geometry provides +14 percentage points over phoneme identity alone (linear probe, p < 0.001). These results constitute the first empirical proof that bodily-grounded phonological representations organize into semantic categories without gradient-based optimization. The architecture achieves O(1) context memory and O(L) time, with interpretable state coordinates that Mamba's latent vectors lack.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo