Large Language Models (LLMs) have attained state-of-the-art performance on various tasks. But since most of these models contain billions of parameters, they consume enormous amounts of memory, computation and energy resources. Thus, they constitute immense challenges in their deployment, inference and training. Conventional pruning methods, while reducing the parameter count effectively, at times make sub-optimal trade offs between efficiency of a model and its performance. In this work, we employ a hybrid approach of LLM-specific pruning strategies and a framework that employs the Genetic Algorithms (GA) framework for global optimization of pruning patterns. Inspired by biological evolution, the pruned model is then represented as chromosomes and is then successively optimized using the operations of selection, crossover, and mutation. This approach, inspired by the biological evolution process of natural selection, enables a systematic exploration of the pruning space.
Dahiya et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: