This paper develops a rigorous comparative distortion theory for covariance-dominated delta encoding in the context of activation compression for large language model (LLM) inference systems. The central question is whether delta encoding—representing each activation vector as the difference from the previous reconstructed state—can yield a provable distortion advantage over direct quantization of absolute activations under a fixed quantization architecture. Two complementary theorem lines are established. The first is a fully rigorous finite-bit coordinate-wise distortion theory: for quantization in a covariance eigen-basis with explicit clipping ranges and bit allocations, operational distortion upper bounds are derived under both bounded-support and clipped finite-second-moment models. Under uniformly matched bounded-support designs, Loewner-order covariance domination yields a strict comparative upper-bound advantage for delta representations. In the clipped regime, covariance domination alone is shown to control only the covariance-scale term, and an additional tail-domination assumption is isolated as necessary for a full comparative statement. The second theorem line is a conditional comparative high-resolution product quantization (PQ) theorem: under a shared high-resolution operational regime and fixed PQ architecture, architecture-dependent constants and common rate factors cancel in the distortion ratio, yielding a determinant-controlled bound of the form Ddelta ≤ α · (1+η) / (1−η) · DV. The paper identifies the joint condition α (1+η) / (1−η) < 1 as the operative requirement for a meaningful comparative advantage, and explains that the approximation parameter η absorbs source-shape variability beyond asymptotic high-rate corrections. The paper also addresses the operational fixed-point structure created by the recursive definition of the delta source: because the delta covariance depends on the pipeline's own reconstruction quality, covariance-domination factors must be calibrated under steady-state pipeline operation rather than using ground-truth reference states, which would systematically underestimate the true domination factor. This work is motivated by empirical observations in residual-based KV-cache compression (DeltaKV, 2025) and transform-domain KV quantization (TurboAngle, 2025), which show that delta representations exhibit smaller covariance traces and flatter eigenvalue spectra. The present paper provides the mathematical framework that makes this intuition precise, without relying on those empirical observations as proof. This is version 2 of the preprint. Version 1 (https: //doi. org/10. 5281/zenodo. 19440450) used broader rate-distortion language, an informal quantizer model without explicit clipping or finite-bit structure, and included an extended empirical motivation section reviewing the KV-cache compression literature. Version 2 introduces a precisely defined clipped finite-bit quantizer model, separates two distinct theorem lines with their own assumptions and proofs, standardizes the distinction between actual distortions and derived upper bounds throughout, and adds explicit treatment of the operational fixed-point structure and the joint calibration condition on α and η.
Building similarity graph...
Analyzing shared references across papers
Loading...
Bo Jun Han
Building similarity graph...
Analyzing shared references across papers
Loading...
Bo Jun Han (Tue,) studied this question.
www.synapsesocial.com/papers/69d893c96c1944d70ce04c51 — DOI: https://doi.org/10.5281/zenodo.19451343