Large language models (LLMs) can accelerate software development, but their probabilistic behaviorinteracts poorly with large codebases: changes drift beyond intended scope, debugging sessionsaccumulate costly context, and “fixes” can silently trade one failure for another. This paper presentsa contract-governed workflow for LLM-assisted development in a medium-scale Rust microkernelOS targeting RISC-V. The workflow separates normative contracts (RFCs specifying interfaces,invariants, and failure models) from execution single sources of truth (task files defining scope,touched-path allowlists, stop conditions, and canonical proof commands), with ADRs capturingboundary changes. To improve reproducibility, we apply deterministic proof gating: host-firstunit/contract/E2E tests, OS-slice feature and dependency hygiene gates, and a phased QEMU smokeharness that validates an ordered UART marker ladder under bounded timeouts and sequentialexecution discipline.Using this structure on an OS with approximately ∼ 100 crates and on the order of ∼ 100k userspaceRust LOC (plus a ∼ 16k LOC Rust microkernel), we show how contract-first planning and evidence-based completion semantics reduce scope creep, prevent “fake-green” progress, and make LLM-drivenchanges auditable through recorded proofs rather than conversational history. We discuss failuremodes observed in practice—over-engineering, destructive refactors, and debugging-induced contextblowups—and show how task-scoped execution, hard stop conditions, and externalized state snapshotsmitigate them in a microkernel, service-oriented architecture where subsystems can be isolated andvalidated independently.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jenning Schäfer
Building similarity graph...
Analyzing shared references across papers
Loading...
Jenning Schäfer (Tue,) studied this question.
synapsesocial.com/papers/69b25b0996eeacc4fcec95eb — DOI: https://doi.org/10.5281/zenodo.18941283