The growing field of Tiny Machine Learning (TinyML) focuses on deploying advanced Machine Learning models on resource-constrained devices, where achieving high model performance while adhering to hardware limitations remains a key challenge. This study proposes a Quantization-Aware Training (QAT) method that combines Vector Quantization with Evolving Clustering Methods, specifically AutoCloud, DBStream, Affinity Propagation, and Mean-Shift, selected for their adaptive capabilities in dynamically partitioning model parameters during training. The method was validated using automotive sensor data for CO2 emission prediction and deployed on the Macchina A0, a compact OBD-II automotive interface widely used for real-time vehicle diagnostics. Results show a 7. 14 compression rate, 22% lower energy consumption, 45% reduction in Flash memory usage, and 30% faster inference compared to baseline models. Compared to TensorFlow’s QAT Int8, the proposed approach reduced root mean squared error by 18. 7%, accelerated inference by 27. 4%, used 19. 2% less Flash memory, and consumed 15. 3% less energy. AutoCloud achieved the best trade-off between accuracy and efficiency, while Mean-Shift provided the lowest prediction error. These findings demonstrate the method’s potential for scalable and energy-efficient TinyML deployments in embedded automotive systems and other resource-constrained environments.
Flores et al. (Tue,) studied this question.