What question did this study set out to answer?

The aim is to develop a structured workflow for LLM-assisted engineering in a Rust microkernel OS to enhance reproducibility and auditability.

March 12, 2026Open Access

Contract-Governed LLM Development: Reproducible AI-Assisted Engineering in a Medium-Scale Rust Microkernel Os

Read Full Paperexternally

Key Points

The aim is to develop a structured workflow for LLM-assisted engineering in a Rust microkernel OS to enhance reproducibility and auditability.
Implemented a contract-governed development workflow separating contracts and execution files.
Applied deterministic proof gating including unit tests, feature gates, and a phased QEMU smoke harness.
Utilized task-scoped execution with hard stop conditions and externalized state snapshots.
Demonstrated reduction in scope creep and prevention of 'fake-green' progress with contract-first planning.
Improved auditability of LLM-driven changes through recorded proofs instead of conversational history.
Identified and mitigated failure modes like over-engineering and excessive debugging context.

Abstract

Large language models (LLMs) can accelerate software development, but their probabilistic behaviorinteracts poorly with large codebases: changes drift beyond intended scope, debugging sessionsaccumulate costly context, and “fixes” can silently trade one failure for another. This paper presentsa contract-governed workflow for LLM-assisted development in a medium-scale Rust microkernelOS targeting RISC-V. The workflow separates normative contracts (RFCs specifying interfaces,invariants, and failure models) from execution single sources of truth (task files defining scope,touched-path allowlists, stop conditions, and canonical proof commands), with ADRs capturingboundary changes. To improve reproducibility, we apply deterministic proof gating: host-firstunit/contract/E2E tests, OS-slice feature and dependency hygiene gates, and a phased QEMU smokeharness that validates an ordered UART marker ladder under bounded timeouts and sequentialexecution discipline.Using this structure on an OS with approximately ∼ 100 crates and on the order of ∼ 100k userspaceRust LOC (plus a ∼ 16k LOC Rust microkernel), we show how contract-first planning and evidence-based completion semantics reduce scope creep, prevent “fake-green” progress, and make LLM-drivenchanges auditable through recorded proofs rather than conversational history. We discuss failuremodes observed in practice—over-engineering, destructive refactors, and debugging-induced contextblowups—and show how task-scoped execution, hard stop conditions, and externalized state snapshotsmitigate them in a microkernel, service-oriented architecture where subsystems can be isolated andvalidated independently.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jenning Schäfer

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Contract-Governed LLM Development: Reproducible AI-Assisted Engineering in a Medium-Scale Rust Microkernel Os

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study