Los puntos clave no están disponibles para este artículo en este momento.
The use of large-language models is widespread in a range of applications, including natural language processing and multimodal tasks. However, these models are computationally intensive. This work presents a novel approach that shows the ability to reduce the size of publicly available LLMs, including Llama-2-7B, GPT-J, and LLaMA. This work uses a parameter-efficient fine-tuning (PEFT) library. The experiment reveals that the quantized version of LLM had a considerable reduction in memory size and significantly improved the model's operational efficiency. This quantization process has the potential to bridge the gap between sophisticated language models and practical deployment scenarios, providing opportunities for the use of large languaae models in resource-constrained applications.
Kodali et al. (Wed,) studied this question.