This work introduces a semantic-aware execution strategy for GPU workloads that reduces memory usage by 82.31% through a data-centric optimization pipeline. The approach restructures execution graphs using semantic dependencies instead of static scheduling, allowing controlled memory reuse, reduced allocation pressure, and improved computational throughput without requiring specialized hardware. The proposed method demonstrates that large-scale GPU workloads—traditionally dependent on high-memory cards—can be executed on lower-resource devices by reconstructing the execution model around meaning rather than brute-force allocation. This work outlines the execution algorithm, memory model, experimental results, and implications for democratizing high-performance computing. This preprint is part of the Node Zero Research Division, focused on sovereign AI computation and accessible GPU optimization.
Emmanuel Sánchez Pache (Tue,) studied this question.