Large language models (LLMs) can accelerate software development, but their probabilistic behaviorinteracts poorly with large codebases: changes drift beyond intended scope, debugging sessionsaccumulate costly context, and “fixes” can silently trade one failure for another. This paper presentsa contract-governed workflow for LLM-assisted development in a medium-scale Rust microkernelOS targeting RISC-V. The workflow separates normative contracts (RFCs specifying interfaces,invariants, and failure models) from execution single sources of truth (task files defining scope,touched-path allowlists, stop conditions, and canonical proof commands), with ADRs capturingboundary changes. To improve reproducibility, we apply deterministic proof gating: host-firstunit/contract/E2E tests, OS-slice feature and dependency hygiene gates, and a phased QEMU smokeharness that validates an ordered UART marker ladder under bounded timeouts and sequentialexecution discipline.Using this structure on an OS with approximately ∼ 100 crates and on the order of ∼ 100k userspaceRust LOC (plus a ∼ 16k LOC Rust microkernel), we show how contract-first planning and evidence-based completion semantics reduce scope creep, prevent “fake-green” progress, and make LLM-drivenchanges auditable through recorded proofs rather than conversational history. We discuss failuremodes observed in practice—over-engineering, destructive refactors, and debugging-induced contextblowups—and show how task-scoped execution, hard stop conditions, and externalized state snapshotsmitigate them in a microkernel, service-oriented architecture where subsystems can be isolated andvalidated independently.
Jenning Schäfer (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: