Local agent frameworks inject human-readable markdown files directly into the language model context on every inference call, imposing a persistent token overhead consistent with quadratic prefill costs in standard self-attention. This paper presents SMELT (Schema-aware Markdown compilation for Efficient Local Token inference), a provenance-preserving compilation system that transforms agent workspace markdown into a dense, auditable runtime representation. SMELT operates across four layers: lossless archival storage with SHA-256 round-trip verification, schema-aware semantic compilation that reduces structural redundancy while preserving values, macro-level dictionary compression, and query-conditioned selective emission that delivers only the context relevant to a given prompt. Evaluated on a production OpenClaw workspace running Qwen 3.5 VL 122B A10B (8-bit, MLX) on Apple M3-Ultra hardware, SMELT achieves a measured 6% reduction in time-to-first-token on the full startup bundle and 78 to 97% token reduction on query-conditioned retrieval across ten diverse query types, with high fidelity on most tested files. Baseline comparisons show that query-conditioned SMELT reduces prompt tokens by 94 to 98% compared to raw markdown, heading-stripped markdown, and naive JSON conversion. A key empirical finding is that byte-optimal compression and token-optimal compression are distinct objectives under the tested tokenizer. The system preserves full provenance, enabling decompilation from runtime format back to the original source. SMELT treats agent context as a systems problem: source files remain human-readable; runtime context is compiled.
Edmund Lister (Thu,) studied this question.