We present SpectralAI, a system that replaces the O(N²) matrix multiplication in Mixture-of-Experts (MoE) routing with O(N log N) Bounding Volume Hierarchy (BVH) traversal on dedicated NVIDIA RT Core hardware. Our approach projects token embeddings into 3D geometric space and uses hardware-accelerated BVH traversal for expert selection, achieving 113–218× routing speedup and 731× VRAM reduction on a single NVIDIA RTX 5070 Ti. We validate on OLMoE-1B-7B (7B parameters, 64 experts, 16 MoE layers): BVH pre-filter mode achieves perplexity 6.79 (+1.5% vs baseline), RT Core routing runs at 19.1 μs/batch with 13.4M queries/s, and downstream HellaSwag accuracy drops only 1.1 percentage points. We also introduce the Inception Engine, a nested Instance Acceleration Structure that composes four levels of 3D spaces into an effective 12-dimensional semantic representation, bypassing the hardware's native 3D limitation. To the best of our knowledge, this is the first system to repurpose GPU ray tracing cores for neural network expert routing. Package includes the paper (PDF + markdown source), validation data, and all figures.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jordi Silvestre Lopez
Building similarity graph...
Analyzing shared references across papers
Loading...
Jordi Silvestre Lopez (Thu,) studied this question.
www.synapsesocial.com/papers/69d894ce6c1944d70ce05ba7 — DOI: https://doi.org/10.5281/zenodo.19457288