In this project, we aimed to enhance the computational efficiency and deployment feasibility of neural networks through mixed precision quantization. We implemented two quantization-aware training (QAT) methods. Our results demonstrated significant reductions in model bitwidth assignments while maintaining accuracy comparable to fullprecision models.
Omar Lahyani (Wed,) studied this question.