July 3, 2024Open Access

LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

Key Points

Key points are not available for this paper at this time.

Abstract

Guardrails have emerged as an alternative to safety alignment for content moderation of large language models (LLMs). Existing model-based guardrails have not been designed for resource-constrained computational portable devices, such as mobile phones, more and more of which are running LLM-based applications locally. We introduce LoRA-Guard, a parameter-efficient guardrail adaptation method that relies on knowledge sharing between LLMs and guardrail models. LoRA-Guard extracts language features from the LLMs and adapts them for the content moderation task using low-rank adapters, while a dual-path design prevents any performance degradation on the generative task. We show that LoRA-Guard outperforms existing approaches with 100-1000x lower parameter overhead while maintaining accuracy, enabling on-device content moderation.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Elesedy et al. (Wed,) studied this question.

www.synapsesocial.com/papers/68e6191db6db6435875abdb8 — DOI: https://doi.org/10.48550/arxiv.2407.02987

Authors

Hayder Elesedy

Pedro M. Esperança

Silviu Vlad Oprea

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion