This work introduces ILPG, a proposed architecture for large language model (LLM) computation that explores parallel reasoning structures instead of strictly sequential token generation as used in transformer-based models.The approach investigates how AI workloads could be partitioned and executed in parallel across distributed computational resources, including idle devices and underutilized memory available across billions of connected systems worldwide.If validated at scale, this architecture could enable a new model of AI infrastructure that is less dependent on centralized data centers and more capable of leveraging globally distributed computational capacity.The research presents the conceptual framework, experimental exploration, and early evaluation of latency, coherence, and computational efficiency improvements compared to traditional sequential pipelines.
Rafael Aquino (Mon,) studied this question.