What question did this study set out to answer?

The aim is to develop an adaptive method to compress machine learning models for resource-constrained devices.

April 10, 2026Open Access

Evolving vector quantization-aware training: an adaptive method for compressing machine learning models

Key Points

The aim is to develop an adaptive method to compress machine learning models for resource-constrained devices.
Combined vector quantization with evolving clustering methods for model parameter partitioning
Validated using automotive sensor data for CO2 emission prediction
Deployed on the Macchina A0 automotive interface
Achieved 7.14× compression rate and 22% lower energy consumption
45% reduction in Flash memory usage and 30% faster inference than baseline models
Reduced root mean squared error by 18.7% compared to TensorFlow's QAT Int8

Abstract

The growing field of Tiny Machine Learning (TinyML) focuses on deploying advanced Machine Learning models on resource-constrained devices, where achieving high model performance while adhering to hardware limitations remains a key challenge. This study proposes a Quantization-Aware Training (QAT) method that combines Vector Quantization with Evolving Clustering Methods, specifically AutoCloud, DBStream, Affinity Propagation, and Mean-Shift, selected for their adaptive capabilities in dynamically partitioning model parameters during training. The method was validated using automotive sensor data for CO2 emission prediction and deployed on the Macchina A0, a compact OBD-II automotive interface widely used for real-time vehicle diagnostics. Results show a 7. 14 compression rate, 22% lower energy consumption, 45% reduction in Flash memory usage, and 30% faster inference compared to baseline models. Compared to TensorFlow’s QAT Int8, the proposed approach reduced root mean squared error by 18. 7%, accelerated inference by 27. 4%, used 19. 2% less Flash memory, and consumed 15. 3% less energy. AutoCloud achieved the best trade-off between accuracy and efficiency, while Mean-Shift provided the lowest prediction error. These findings demonstrate the method’s potential for scalable and energy-efficient TinyML deployments in embedded automotive systems and other resource-constrained environments.

Bookmark

View Full Paper

Bookmark

View Full Paper

Evolving vector quantization-aware training: an adaptive method for compressing machine learning models

Key Points

Abstract

Cite This Study