What question did this study set out to answer?

This research aims to improve token efficiency in local agent frameworks through SMELT, a novel compilation system.

April 4, 2026Open Access

SMELT: Schema-Aware Markdown Compilation for Efficient Local Token Inference

Key Points

This research aims to improve token efficiency in local agent frameworks through SMELT, a novel compilation system.
Developed SMELT to transform markdown into a compact runtime representation.
Implemented lossless archival storage with SHA-256 verification.
Created schema-aware semantic compilation to minimize redundancy while keeping values intact.
Applied macro-level dictionary compression and selective emission based on queries.
Achieved a 6% reduction in time-to-first-token for the full startup bundle.
Demonstrated a 78 to 97% reduction in tokens during query-conditioned retrieval.
Showed that SMELT reduces prompt tokens by 94 to 98% compared to baseline markdown formats.
Validated the distinctness of byte-optimal and token-optimal compression objectives.

Abstract

Local agent frameworks inject human-readable markdown files directly into the language model context on every inference call, imposing a persistent token overhead consistent with quadratic prefill costs in standard self-attention. This paper presents SMELT (Schema-aware Markdown compilation for Efficient Local Token inference), a provenance-preserving compilation system that transforms agent workspace markdown into a dense, auditable runtime representation. SMELT operates across four layers: lossless archival storage with SHA-256 round-trip verification, schema-aware semantic compilation that reduces structural redundancy while preserving values, macro-level dictionary compression, and query-conditioned selective emission that delivers only the context relevant to a given prompt. Evaluated on a production OpenClaw workspace running Qwen 3.5 VL 122B A10B (8-bit, MLX) on Apple M3-Ultra hardware, SMELT achieves a measured 6% reduction in time-to-first-token on the full startup bundle and 78 to 97% token reduction on query-conditioned retrieval across ten diverse query types, with high fidelity on most tested files. Baseline comparisons show that query-conditioned SMELT reduces prompt tokens by 94 to 98% compared to raw markdown, heading-stripped markdown, and naive JSON conversion. A key empirical finding is that byte-optimal compression and token-optimal compression are distinct objectives under the tested tokenizer. The system preserves full provenance, enabling decompilation from runtime format back to the original source. SMELT treats agent context as a systems problem: source files remain human-readable; runtime context is compiled.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Edmund Lister (Thu,) studied this question.

synapsesocial.com/papers/69d0aefd659487ece0fa4dd7 https://doi.org/https://doi.org/10.5281/zenodo.19380351

Bookmark

View Full Paper