Transformer-based large language models are built on architectures decades old. The AdamW optimizer and backpropagation — the twin pillars of modern AI training — are computationally expensive by design. AdamW alone requires storing momentum and variance states for every parameter, tripling the memory footprint of any model being trained. Backpropagation requires a full forward and backward pass for every update. VALENCE proposes replacing these mechanisms entirely with a physics-based, ASIC-adjacent methodology executable on consumer graphics hardware. Modern GPUs have invested heavily in ray tracing — hardware specialized to simulate physical light interactions in real time. We propose that these same RT cores can simulate semantic interactions in language space, replacing abstract matrix mathematics with physically traversable geometry. The result is an attention mechanism that scales at **O(log N)** rather than the O(N²) of standard transformer attention, with no backpropagation, no optimizer states, and no hard training cutoff. The transformer was a limitation. Backpropagation is slow and AdamW is ancient. We ripped it all out.
Building similarity graph...
Analyzing shared references across papers
Loading...
Robert Zachary Nemitz
Building similarity graph...
Analyzing shared references across papers
Loading...
Robert Zachary Nemitz (Sat,) studied this question.
www.synapsesocial.com/papers/69d34e949c07852e0af9837f — DOI: https://doi.org/10.5281/zenodo.19421339