March 3, 2026Open Access

Mixed-precision quantization techniques for energyefficient DNN inference

Key Points

Results indicate that mixed precision quantization significantly reduces model bitwidth assignments, enhancing computational efficiency.
Two quantization-aware training methods were implemented, showcasing a potential to maintain accuracy while lowering resource demands.
The analysis showed the ability to deploy neural networks with reduced bitwidth, affirming their feasibility in practical applications.
Implications suggest that such techniques can lead to energy-efficient inference, which is crucial for large-scale deployment of deep learning models.

Abstract

In this project, we aimed to enhance the computational efficiency and deployment feasibility of neural networks through mixed precision quantization. We implemented two quantization-aware training (QAT) methods. Our results demonstrated significant reductions in model bitwidth assignments while maintaining accuracy comparable to fullprecision models.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Omar Lahyani (Wed,) studied this question.

synapsesocial.com/papers/69a75e67c6e9836116a28fd7

Bookmark

View Full Paper