Abstract Model compression is key for deploying Deep Neural Networks on resource-constrained hardware. Its application to Neural Network Controllers (NNCs) is challenging because it can compromise control-theoretic properties and performance, a critical issue for modern controllers that use latent-space models. This paper surveys and empirically evaluates compression techniques for these models, using an MNIST autoencoder and a Temporal Difference Model Predictive Control agent as test cases across diverse hardware. We find that general compression techniques apply to latent-space models and that careful compression can preserve the theoretical properties of NNCs. Specific findings indicate that quantization can increase latency on non-specialized hardware, fine-tuning is crucial for performance recovery, and hybrid methods yield the best trade-offs.
Sundaram et al. (Sun,) studied this question.