What question did this study set out to answer?

Develop a neuromorphic spike-based framework for large language models to improve efficiency and interpretability.

December 8, 2025Open Access

Neuromorphic spike-based large language model

Key Points

Develop a neuromorphic spike-based framework for large language models to improve efficiency and interpretability.
Transformed LLMs into NSLLMs using neural dynamics and rigorous mathematical modeling.
Implemented advanced techniques like quantization and sparsification.
Utilized computational neuroscience tools for analyzing information encoding processes.
Achieved a dynamic power consumption of only 13.849 watts.
Attained an inference throughput of 161.8 tokens per second.
Improved energy efficiency, memory usage, and throughput by 19.8×, 21.3×, and 2.2× compared to A800 GPU.

Abstract

Abstract This work proposes a unified neuromorphic spike-based LLMs (NSLLM) framework to simultaneously address the challenges of high energy consumption and low interpretability in large language models (LLMs). Our framework transforms LLMs into efficient NSLLMs by converting their behaviors into neural dynamics–such as spike trains–through rigorous mathematical modeling and complemented by advanced techniques including quantization and sparsification. This transformation also enables the analysis of information encoding processes using computational neuroscience tools, thereby offering a novel neuroscientific perspective that conceptualizes LLMs as neural populations to enhance their interpretability. Leveraging a hardware-algorithm co-design paradigm, NSLLM can completely eliminate matrix multiplication (MatMul) while maintaining high performance. We designed a custom MatMul-free hardware core on the VCK190 FPGA to validate the 1.5-billion-parameter NSLLM model, achieving a dynamic power consumption of only 13.849 watts and an inference throughput of 161.8 tokens per second. Compared with the A800 GPU, this implementation improves energy efficiency, memory usage, and inference throughput by 19.8×, 21.3×, and 2.2×, respectively. This work provides a novel perspective within a unified framework to enhance both the energy efficiency and interpretability of LLMs, offering valuable insights for future neuromorphic chip designs tailored for large models.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper