Low-bit quantization for large language models has advanced rapidly through binary, ternary, post-training, and mixed-precision methods, yet the design space remains organized mainly around integer bit-width ladders or special-case ternary schemes. We introduce multi-trit quantization as a theory-first framework that generalizes ternary weight representations into balanced N-trit expansions, significance-ordered trit-plane decompositions, and mixed-layer trit-depth allocation. In the proposed formulation, each quantized weight is represented by a scale parameter multiplied by a significance-ordered sum of signed ternary digits drawn from the set -1, 0, +1. We show that this family induces precision operating points at rates equal to N times log base 2 of 3, generates exact representable sets with 3N values, and supports fixed-scale nested representable families together with a canonical digit-wise decomposition structure. At fixed scale, the scalar representable set coincides with a symmetric uniform grid of 3N levels; the value of the framework lies not in denying that scalar fact, but in exposing a structured ternary decomposition, canonical digit-wise truncation and bounded remainder relations, and a mixed-layer allocation variable based on radix depth rather than binary bit-width alone. Fixed-depth examples such as T3, T5, and T7 should therefore be read as illustrative operating points or controlled baselines, not as evidence that a single global trit depth is the intended deployment policy. We position prior ternary LLM work, trit-plane coding, non-uniform quantization, and mixed-precision allocation as supporting precedent rather than direct realizations of the full framework. The paper therefore claims not empirical superiority, but the formal definition and positioning of a new research program for LLM quantization and future trit-native architectures.
Jeff Mcgillis Heiden (Mon,) studied this question.