May 31, 2024Open Access

LCQ: Low-Rank Codebook based Quantization for Large Language Models

Key Points

Key points are not available for this paper at this time.

Abstract

Large language models~(LLMs) have recently demonstrated promising performance in many tasks. However, the high storage and computational cost of LLMs has become a challenge for deploying LLMs. Weight quantization has been widely used for model compression, which can reduce both storage and computational cost. Most existing weight quantization methods for LLMs use a rank-one codebook for quantization, which results in substantial accuracy loss when the compression ratio is high. In this paper, we propose a novel weight quantization method, called low-rank codebook based quantization~(LCQ), for LLMs. LCQ adopts a low-rank codebook, the rank of which can be larger than one, for quantization. Experiments show that LCQ can achieve better accuracy than existing methods with a negligibly extra storage cost.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper

Cite This Study

Cai et al. (Fri,) studied this question.

synapsesocial.com/papers/68e6785bb6db643587602a68 https://doi.org/https://doi.org/10.48550/arxiv.2405.20973

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

KI fragen

Bookmark

View Full Paper