March 12, 2025

PQNAS: Mixed-precision Quantization-aware Neural Architecture Search with Pseudo Quantizer

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Quantization-aware neural architecture search is an efficient way to automatically search for the best quantized model that can meet the limited resource constraints on edge devices. Existing methods utilize the straight-through estimator for training the quantized supernet, but lead to oscillation problem of learned weights. To address this issue, we introduce pseudo quantization noise (PQN) in quantization-aware NAS. Accurate range of PQN is vital to ensuring the high accuracy of the quantized networks. However, producing accurate noise for activation quantization during supernet training is challenging, as it requires precise estimation of the quantization parameters for each subnet. Different distributions of different subnets will result in different noise range. To this end, we propose PQNAS, a mixed-precision quantization-aware NAS framework with Pseudo Quantizer. We propose adaptive quantization parameters (AQP) which are trained with the distribution of activation in the pseudo quantizer. With AQP, we can obtain accurate noise range for different subnets during training. Experimentally, comparing with the existing method, the proposed PQNAS achieves 0.99%∼6.48% Top-1 accuracy improvement on ImageNet 1K dataset and 1.1% mAP improvement on COCO dataset.

Me gusta

Guardar