February 28, 2024

A Quantization Approach for the Reduced Size of Large Language Models

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

The use of large-language models is widespread in a range of applications, including natural language processing and multimodal tasks. However, these models are computationally intensive. This work presents a novel approach that shows the ability to reduce the size of publicly available LLMs, including Llama-2-7B, GPT-J, and LLaMA. This work uses a parameter-efficient fine-tuning (PEFT) library. The experiment reveals that the quantized version of LLM had a considerable reduction in memory size and significantly improved the model's operational efficiency. This quantization process has the potential to bridge the gap between sophisticated language models and practical deployment scenarios, providing opportunities for the use of large languaae models in resource-constrained applications.

Preguntar a la IA

Me gusta

Guardar

Cite This Study

Kodali et al. (Wed,) studied this question.

synapsesocial.com/papers/68e7720db6db6435876e7208 https://doi.org/https://doi.org/10.1109/kst61284.2024.10499664

Preguntar a la IA

Me gusta

Guardar