What does this research mean for the field?

LLMGuard provides an efficient solution for safeguarding large language models during real-time inference on edge devices, achieving a 43-fold speedup while minimizing accuracy loss. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

What question did this study set out to answer?

The research aims to enhance the security of large language models during real-time inference on edge devices, addressing vulnerabilities to model theft.

February 27, 2026

LLMGuard : Safeguarding Real-Time Inference for Large Language Models on Edge Devices

Puntos clave

The research aims to enhance the security of large language models during real-time inference on edge devices, addressing vulnerabilities to model theft.
Developed a Bayesian theory framework for model stealing attacks.
Introduced Intrinsic Parameters Shielding to protect private model parameters.
Implemented Random Slices Composition to obfuscate intermediate distributions.
Conducted experimental validation on LLaMA-7B model.
Achieved a 43x increase in inference speed compared to fully-shielded methods.
Downgraded model to black-box inference with negligible accuracy loss.
Significantly reduced secure memory requirements for large language models.

Resumen

TEE-shielded secure inference offers an efficient solution to protect valuable edge-deployed models from potential thefts. Nevertheless, existing methods are lack of theoretical security analysis, failing to achieve the optimal security. Furthermore, while feasible for small models, existing methods are excessively heavyweight for Large Language Models (LLM). For LLaMA-7B, they introduce GB-level secure memory requirement and hundredfold inference latency, severely compromising real-time utility. To solve these problems, we first present a Bayesian theory framework of Model Stealing (MS) attacks, which decomposes MS into prior and posterior knowledge leakage. Based on this framework, LLMGuard is proposed, which presents two components: First, Intrinsic Parameters Shielding is designed to shield all private parameters, preventing prior knowledge leakage. This approach significantly decreases the secure memory usage and achieves inference speedup. Second, since OTP is not applicable to LLMs, Random Slices Composition is developed to obfuscate intermediate distributions with no computational overhead, minimizing posterior knowledge leakage efficiently. Experimental results demonstrate that LLMGuard downgrades model to black-box inference with negligible accuracy loss, while delivering \ (43\) inference speedup on LLaMA compared to fully-shielded methods. The proposed LLMGuard effectively addresses concerns related to intellectual property theft on edge, boosting the secure deployment of LLMs on untrusted devices.

Me gusta

Guardar

Me gusta

Guardar

LLMGuard : Safeguarding Real-Time Inference for Large Language Models on Edge Devices

Puntos clave

Resumen

Cite This Study