What question did this study set out to answer?

The aim is to develop a structured workflow for LLM-assisted engineering in a Rust microkernel OS to enhance reproducibility and auditability.

March 12, 2026Open Access

Contract-Governed LLM Development: Reproducible AI-Assisted Engineering in a Medium-Scale Rust Microkernel Os

Key Points

The aim is to develop a structured workflow for LLM-assisted engineering in a Rust microkernel OS to enhance reproducibility and auditability.
Implemented a contract-governed development workflow separating contracts and execution files.
Applied deterministic proof gating including unit tests, feature gates, and a phased QEMU smoke harness.
Utilized task-scoped execution with hard stop conditions and externalized state snapshots.
Demonstrated reduction in scope creep and prevention of 'fake-green' progress with contract-first planning.
Improved auditability of LLM-driven changes through recorded proofs instead of conversational history.
Identified and mitigated failure modes like over-engineering and excessive debugging context.

Abstract

Large language models (LLMs) can accelerate software development, but their probabilistic behaviorinteracts poorly with large codebases: changes drift beyond intended scope, debugging sessionsaccumulate costly context, and “fixes” can silently trade one failure for another. This paper presentsa contract-governed workflow for LLM-assisted development in a medium-scale Rust microkernelOS targeting RISC-V. The workflow separates normative contracts (RFCs specifying interfaces,invariants, and failure models) from execution single sources of truth (task files defining scope,touched-path allowlists, stop conditions, and canonical proof commands), with ADRs capturingboundary changes. To improve reproducibility, we apply deterministic proof gating: host-firstunit/contract/E2E tests, OS-slice feature and dependency hygiene gates, and a phased QEMU smokeharness that validates an ordered UART marker ladder under bounded timeouts and sequentialexecution discipline.Using this structure on an OS with approximately ∼ 100 crates and on the order of ∼ 100k userspaceRust LOC (plus a ∼ 16k LOC Rust microkernel), we show how contract-first planning and evidence-based completion semantics reduce scope creep, prevent “fake-green” progress, and make LLM-drivenchanges auditable through recorded proofs rather than conversational history. We discuss failuremodes observed in practice—over-engineering, destructive refactors, and debugging-induced contextblowups—and show how task-scoped execution, hard stop conditions, and externalized state snapshotsmitigate them in a microkernel, service-oriented architecture where subsystems can be isolated andvalidated independently.

Contract-Governed LLM Development: Reproducible AI-Assisted Engineering in a Medium-Scale Rust Microkernel Os

Key Points

Abstract

Cite This Study

Also Consider

Also Consider