We present QPP (Quantile Piecewise Perceptron), a novel parametric compression technique that reduces the number of parameters rather than just their bit precision. QPP represents each weight matrix row as a quantile curve fitted with 32 anchors and block-shared ordering. Combined with a 2-bit codebook over anchors and INT8/INT4 quantization for incompatible layers, our hybrid pipeline achieves 41. 1% compression on Qwen3-4B (8, 045 to 4, 738 MB) with coherent generation. QPP is 7x more effective than GGUF Q4KM on attention layers (21x vs 3. 2x) and 69% faster at inference with cached weights. We also built a physical QPP+GGUF combined file demonstrating orthogonal compression axes.
Ignacio Fernando Suarez Hernandez (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: