What question did this study set out to answer?

The aim is to enhance signal compression while balancing accuracy and privacy using multiple tokens in a class-discriminant subspace.

June 24, 2026Open Access

Residual Vector Quantization Within a Class-Discriminant Subspace: Tunable Multi-Token Extreme Compression of Signals with a Calibratable Accuracy–Privacy–Rate Frontier

Q: What is the clinical evidence from this study?

Study design: Other. Population: ECG signals (n=6380). Intervention: Discriminant-subspace residual vector quantization (RVQ) vs. Ambient-feature-space RVQ. Primary outcome: Macro-AUC (Difference +0.041, 95% CI +0.029, +0.059, p=≈0.000).

Resultado clave

Discriminant-subspace residual vector quantization outperformed ambient-feature-space RVQ for ECG classification (macro-AUC 0.8101 vs 0.7692; 95% CI +0.029 to +0.059; P≈0.000).

Puntos clave

The aim is to enhance signal compression while balancing accuracy and privacy using multiple tokens in a class-discriminant subspace.
Extended residual vector quantization within a class-discriminant subspace for multiple tokens
Balanced clinical dataset containing 12-lead ECG signals (n≈6,380)
Paired comparisons of macro-AUC across diverse compression techniques and datasets.
Multi-token RVQ outperformed ambient-feature-space RVQ by +0.041 macro-AUC (0.8101 vs 0.7692, p≈0.000)
Discriminant 12-dimensional subspace provided a gain of +0.072 over a dimensionality-matched subspace (PCA)
A two-token scheme is recommended for optimal tradeoff between accuracy and re-identification privacy.

PICO estructurado

Población

Balanced real 12-lead clinical ECG cohort (PTB-XL, five superclasses, n≈6,380) and a second dataset (MIT-BIH)

Intervención

Residual vector quantization (RVQ) performed within a supervised class-discriminant subspace (multi-token compression)

Comparador

Ambient-feature-space RVQ (EnCodec/SoundStream-style prior-art baseline) at a matched 18-bit budget

Resultado

Macro-AUC for classification

Performing residual vector quantization within a class-discriminant subspace significantly improves ECG compression accuracy compared to ambient-space RVQ at the same bit rate, offering a tunable accuracy-privacy-rate tradeoff.

Resultado numérico

Estimación del efecto: Difference +0.041 (95% CI +0.029, +0.059)

Tasa de eventos absoluta: 0.8101% vs 0.7692%

valor p: p=≈0.000

Limitaciones

Multi-token depth gain over the single token is realized by lookup and linear downstream heads but not by higher-capacity heads
Multi-token mode trades re-identification privacy for accuracy

Resumen

A single discrete token (≈10 bits) compresses an information-rich signal record by ~1000× but saturates: one token cannot close the gap to an uncompressed classifier. The companion work (Paper 19, Parent N) showed that building the single-token codebook within a supervised class-discriminant subspace recovers a significant fraction of that "compression tax." Here we extend the construction to multiple tokens by residual vector quantization (RVQ) performed within the discriminant subspace — a first token is the nearest-centroid index of the projected feature vector, and each further token quantizes the successive in-subspace residual. On a balanced real 12-lead clinical ECG cohort (PTB-XL, five superclasses, n≈6,380), discriminant-subspace RVQ outperforms an otherwise-identical ambient-feature-space RVQ (the EnCodec/SoundStream-style prior-art baseline) at a matched 18-bit budget by +0.041 macro-AUC (0.8101 vs 0.7692; paired-bootstrap p≈0.000, 95% CI +0.029, +0.059), replicating on a second dataset (MIT-BIH, +0.035). A subspace ablation isolates the mechanism: a *dimensionality-matched* unsupervised 12-component PCA subspace is worse than the 120-dimensional ambient space (−0.031), while the discriminant 12-dimensional subspace is +0.072 above it — the discriminant axes, not dimensionality reduction, drive the gain. The accuracy gain concentrates in the first residual token and saturates at a small token depth; discriminant-RVQ Pareto-dominates ambient-RVQ at every depth. The number of tokens is a tunable operating point on a frontier relating downstream accuracy, re-identification privacy, and bit rate — the single-token point being privacy-preferred — whose normalized shape is dataset-invariant (a normalized privacy-approach fraction of ≈1.2 at two tokens on both PTB-XL and MIT-BIH), so an operating point may be calibrated from two measured anchors. Residual depth further reaches resolutions a single flat codebook of equal bits cannot feasibly train. We report a recommended recipe (two tokens, codebook size ≈64, a single shared codebook, an in-subspace residual), a kernel-discriminant-subspace variant that stacks with residual depth (highest observed single-record macro-AUC 0.8175), a centroid-noise privacy mitigation that returns the multi-token stream to single-token re-identification resistance, an embedded/progressively-decodable token stream served by one downstream model across rates, and a built-in Mahalanobis/conformal novelty monitor that strengthens with depth. We report honestly that the multi-token *depth* gain over the single token is realized by lookup and linear downstream heads but not by higher-capacity heads, and that the multi-token mode trades re-identification privacy for accuracy. Every threshold was frozen before data examination; negatives are reported verbatim. Keywords / index terms: residual vector quantization; multi-token compression; class-discriminant subspace; kernel discriminant analysis; rate–relevance frontier; accuracy–privacy tradeoff; progressive/embedded coding; novelty detection; electrocardiogram; pre-registration; spiral-domain encoder; H-pipeline. References: 1. Y. Linde, A. Buzo, and R. Gray, "An algorithm for vector quantizer design," IEEE Trans. Communications, 1980. 2. A. van den Oord, O. Vinyals, and K. Kavukcuoglu, "Neural discrete representation learning (VQ-VAE)," NeurIPS, 2017. 3. N. Zeghidour et al., "SoundStream: an end-to-end neural audio codec," IEEE/ACM TASLP, 2021. 4. A. Défossez, J. Copet, G. Synnaeve, and Y. Adi, "High fidelity neural audio compression (EnCodec)," 2022. 5. R. A. Fisher, "The use of multiple measurements in taxonomic problems," Annals of Eugenics, 1936. 6. G. Baudat and F. Anouar, "Generalized discriminant analysis using a kernel approach," Neural Computation, 2000. 7. N. Tishby, F. Pereira, and W. Bialek, "The information bottleneck method," 1999. 8. V. Vovk, A. Gammerman, and G. Shafer, Algorithmic Learning in a Random World, Springer, 2005. 9. P. Wagner et al., "PTB-XL, a large publicly available electrocardiography dataset," Scientific Data, 2020. 10. G. Moody and R. Mark, "The impact of the MIT-BIH arrhythmia database," IEEE EMB Magazine, 2001. 11. B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, Chapman Parent N, U.S. Provisional Application No. 64/095,354, filed 2026-06-21 (the single-token foundation). Both build on the spiral-domain H-pipeline applications (Parents H/I/J/K/L/M). Licensing inquiries: Randolph James Ferlic, M.D., randolphf@fieldstoneanalyticsllc.com. Reproducibility archive released under CC-BY 4.0.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Ferlic et al. (Mon,) conducted a other in ECG signals (n=6,380). Discriminant-subspace residual vector quantization (RVQ) vs. Ambient-feature-space RVQ was evaluated on Macro-AUC (Difference +0.041, 95% CI +0.029, +0.059, p=≈0.000). Discriminant-subspace residual vector quantization outperformed ambient-feature-space RVQ for ECG classification (macro-AUC 0.8101 vs 0.7692; 95% CI +0.029 to +0.059; P≈0.000).

synapsesocial.com/papers/6a3c2323d15afadd906f9d7c https://doi.org/https://doi.org/10.5281/zenodo.20802825

Me gusta

Guardar

Ver artículo completo