What question did this study set out to answer?

The research aims to enhance language models by combining efficient model compression with privacy preservation techniques.

May 5, 2026Open Access

PrunePrivyTune: Accelerating Language Models with Pruning and Differentially Private Fine-Tuning

Key Points

The research aims to enhance language models by combining efficient model compression with privacy preservation techniques.
Developed PrunePrivyTune methodology for model compression and private fine-tuning.
Utilized pairwise cosine similarity for structural pruning of transformer layers.
Implemented Low-Rank Adaptation with Differentially Private Stochastic Gradient Descent for fine-tuning.
Achieved significant reduction in model size and inference time while maintaining performance.
Demonstrated effective mitigation of memorization risks through Differential Privacy integration.
Privacy assessment revealed strong guarantees against sensitive data exposure.

Abstract

Abstract Large Language Models (LLMs) have demonstrated exceptional capabilities in language understanding and generation, but their large-scale architecture poses significant challenges in deployment and inference, such as increased computational demands and slower processing times. While various techniques like model pruning, knowledge distillation, and quantization have been developed to compress LLMs, they often result in task-specific compression, limiting the model’s versatility. Additionally, LLMs face privacy risks due to their potential to memorize and reproduce sensitive training data, raising concerns when deployed in real-world applications. To address these challenges, we propose a novel methodology PrunePrivyTune that combines efficient model compression with privacy preserving fine-tuning. Our approach leverages pairwise cosine similarity to identify redundant layers in transformer models, enabling structural pruning that reduces model size without compromising performance. After pruning, we apply Low-Rank Adaptation (LoRA) with DPSGD to fine-tune the model. This ensures that fine-tuning process is both efficient and privacy-preserving, outperforming training and preventing the model from memorizing sensitive data. Later on, we generated synthetic data using the fine-tuned model and subsequently conducted a training data extraction attack to assess the model’s privacy vulnerabilities, in terms of perplexity and BERTScore. Our framework demonstrates that the proposed methodology effectively reduces the inference time through model compression and pruning compliments privacy, followed by private fine-tuning. Additionally, our privacy risk assessment indicates that integrating DP successfully mitigates the risk of the model’s memorization. This approach upholds strong privacy guarantees, making it highly suitable for real-time applications and deployment in sensitive domains where data confidentiality is paramount.

Mark Helpful

Bookmark

Relay

View Full Paper