Abstract The deployment of current large language models is severely constrained by the “memory wall” problem inherent in the von Neumann architecture, where storage and computation are separated. Model size is limited by hardware memory capacity, a significant portion of computational power is consumed by data movement rather than actual computation, and context length competes with model capability. Existing optimization methods are mostly patchwork improvements that fail to systematically address the bottleneck. This paper proposes the Storage-Compute Anchoring architecture, in which large language model weights are permanently resident in the storage medium. Through the CXL protocol, CPUs/GPUs can access these weights insitu, with memory serving only as a working region. Based on this core idea, we integrate and refine three clear, progressive engineering paths, each corresponding to different hardware modification footprints and application scenarios. The first path, StorageCompute Anchored, simply replaces traditional hard drives with computational storage drives that embed a dedicated accelerator controller for large models, and adapts the CXL highspeed interface protocol. This enables CPUs to directly access the weights in the drive for inference, making it a minimalchange solution ready for immediate deployment. The second path, CoreSatellite Mapped, consists of a permanently anchored core brain and multiple dynamically expandable satellite neuron nodes. The core brain maps slices of its own capability to the satellite nodes, enabling modular capability expansion, suitable for professional users who need rapid scaling of model abilities. The third path, Fully Distributed Neural Network, decomposes all cognitive functions into independent neuron modules, each with its own storage and compute unit. These modules interconnect via a peertopeer network topology, allowing unlimited scaling, representing a longterm evolutionary hardware form for intelligent agents. This paper systematically demonstrates the hardware implementation, working mechanisms, core advantages, and inherent limitations of each path, and proves the engineering feasibility of all paths based on existing commercial computational storage drives and the mature CXL protocol. This research serves as the hardware culmination and ultimate integration of the author’s previous nine papers. The prior theoretical system—from “model as operating system” to “distributed brainlike architecture”, from “personality kernel” and “portrait side domain” to “layered reflexive neuromuscular system”, from “recursive federation” to “societylevel intelligence”—has reached logical completeness, yet has always been trapped by a fundamental engineering loophole: no matter how sophisticated the architectural design, the model itself must be copied from storage to memory/VRAM to run, forever unable to escape the parasitic constraint of von Neumann’s separation of storage and computation. The StorageCompute Anchoring architecture proposed in this paper precisely plugs this sole loophole at the hardware root, providing a native hardware foundation for the entire original AI paradigm, transitioning it from a “logically conceived software simulation” to a “genuinely natively implementable complete system”. Keywords: Large Language Models; von Neumann architecture; Storage-Compute Anchoring; computational storage drive; cognitive modularity; in-situ thinking unfolding; neuron module; core-satellite architecture; distributed neuron network 摘要 当前大语言模型的部署严重受制于冯·诺依曼架构中“存储与计算分离”带来的内存墙问题:模型规模受限于硬件显存容量,大量算力消耗于数据搬运而非实际计算,上下文长度与模型能力相互挤占。现有优化方案多为缝补式改进,未能系统性解决上述瓶颈。 本文提出“存算锚定”架构,将大语言模型权重永久驻留于存储介质,通过CXL协议实现CPU/GPU对权重的原位访问,内存仅作为工作记忆区。基于此核心思想,本文整合并细化了三条清晰且递进的工程化实现路径,分别对应不同的硬件改动幅度与应用场景。 第一条路径为存算锚定型,仅需将传统硬盘替换为内置大模型专用加速主控的计算存储盘,并适配CXL高速接口协议,即可实现CPU直接访问硬盘中的权重进行推理,是目前可立即落地的极简方案。第二条路径为核心‑卫星映射型,由一个永久锚定的核心脑核与多个可动态扩展的卫星神经元节点构成,核心脑核将自身能力切片映射至卫星节点,实现能力的模块化扩展,适用于需要快速扩展能力的专业用户。第三条路径为全分布式神经元网络型,将所有认知功能拆分为独立的神经元模块,每个模块均自带存储与计算单元,模块之间以对等网络拓扑互联,可无限扩展,是面向长期演进的智能体硬件形态。 本文系统论证了三条路径的硬件实现方案、工作机制、核心优势与固有局限,并基于已有计算存储盘商用产品与CXL成熟协议,充分证明了各路径的工程可行性。 本研究是作者前九篇系列论文的硬件收官与终极整合。前序理论体系——从“模型即操作系统”到“分布式脑式架构”,从“人格内核”“肖像侧写域”到“分层反射式神经肌肉系统”,从“递归联邦”到“社会级智能”——在逻辑层面已臻完备,却始终受困于一个底层工程漏洞:无论架构设计如何精妙,模型本体必须从存储复制到内存/显存才能运行,永远无法摆脱冯·诺依曼“存储与计算分离”的寄生约束。本文提出的存算锚定架构,正是从硬件根源上堵死了这一唯一漏洞,为整套原创AI范式提供了原生的硬件底座,使其从“软件模拟的逻辑构想”真正迈向“可原生落地的完整体系”。
Building similarity graph...
Analyzing shared references across papers
Loading...
Tong Feng
Oldham Council
Oldham Council
Building similarity graph...
Analyzing shared references across papers
Loading...
Tong Feng (Tue,) studied this question.
synapsesocial.com/papers/69cf5eee5a333a821460da79 — DOI: https://doi.org/10.5281/zenodo.19354632