This technical paper presents Rotary GPU, an exploratory execution approach for running large Mixture-of-Experts language models locally on consumer hardware with limited GPU memory. A public validation was conducted using a Qwen3.6-35B-A3B-class MoE model executed on a consumer laptop with an RTX 4060 Laptop GPU containing only 8 GB of VRAM. Under the primary operating configuration, the system generated 2048 output tokens while maintaining approximately 6.3 GB of VRAM usage and an observed decode throughput of 21.06 tokens per second, alongside a 10/10 completion rate on a short smoke-set evaluation. The work derives from a previously disclosed rotary-based accelerator residency concept (Korean Patent Publication KR 10-2026-0070380). Rather than assuming that every model component must remain permanently resident in accelerator memory, the approach treats residency as a rotating resource-management problem in which sub-modules move between execution slots according to structured rotational scheduling. The paper documents externally observable validation results; internal implementation details remain undisclosed. The objective is not to replace data-center infrastructure but to explore whether some capabilities of large models can be brought closer to environments — such as closed-network, on-premise, or resource-constrained organizations — where such infrastructure is unavailable. Results are exploratory rather than definitive, and the validation package requires users to supply their own compatible model files. Part of the ANIMA Research paper series by independent researcher Myeong Jun Jo (ORCID: 0009-0006-9540-4666).
MYEOUNGJUN JO (Wed,) studied this question.