The classical operating system abstraction—built around deterministic instruction execution and thread scheduling—poorly fits modern foundation models, whose workloads are probabilistic, parameter-heavy, and memory-bandwidth-bound. This paper proposes a fundamentally reimagined AI-native operating system whose primitive scheduling unit is an inference request routed across heterogeneous federation of language and perception models, rather than a thread bound to a single core. The emergence of custom inference silicon (such as OpenAI's Jalapeño processor) and standardized agent-coordination protocols makes this architectural shift both possible and necessary. We present SlyOS, a layered reference architecture comprising an agentic application layer, a cognitive scheduler and orchestration plane, an inference runtime, and heterogeneous edge accelerators with optional cloud burst capacity. The core contribution is a formalization of inference scheduling as continuous multi-objective optimization under latency, energy, and integrity constraints. A placement model with provable convergence operates under intermittent connectivity, treating model selection and fallback as deterministic optimization problems rather than heuristics. At the runtime layer, we establish the key-value cache as the true scheduling bottleneck and develop memory-management subsystems for long-lived agentic sessions with cross-session persistence and tiered placement. A critical design principle is continuous runtime attestation for heterogeneous inference pipelines, recognizing that untrusted edge nodes require trust verification beyond boot-time validation. We distinguish genuinely orchestrated inference from cosmetic model chaining, provide the full system stack including fault-tolerance schemes for offline-first operation, and identify open problems in the memory hierarchy, thermal envelopes, and comparative positioning against existing infrastructure. The architecture is presented with reproducible system diagrams, scheduling-flow visualizations, and quantitative comparisons.
Emil (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: