Key points are not available for this paper at this time.
Guardrails have emerged as an alternative to safety alignment for content moderation of large language models (LLMs). Existing model-based guardrails have not been designed for resource-constrained computational portable devices, such as mobile phones, more and more of which are running LLM-based applications locally. We introduce LoRA-Guard, a parameter-efficient guardrail adaptation method that relies on knowledge sharing between LLMs and guardrail models. LoRA-Guard extracts language features from the LLMs and adapts them for the content moderation task using low-rank adapters, while a dual-path design prevents any performance degradation on the generative task. We show that LoRA-Guard outperforms existing approaches with 100-1000x lower parameter overhead while maintaining accuracy, enabling on-device content moderation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Elesedy et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68e6191db6db6435875abdb8 — DOI: https://doi.org/10.48550/arxiv.2407.02987
Hayder Elesedy
Pedro M. Esperança
Silviu Vlad Oprea
Building similarity graph...
Analyzing shared references across papers
Loading...