December 19, 2025Open Access

Efficient Inference for Edge Large Language Models: A Survey

Key Points

Key points are not available for this paper at this time.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing. Their massive computational and memory requirements often necessitate cloud-based deployment, introducing challenges related to cost, latency, privacy, and network reliability. Deploying on-device LLMs alleviates these challenges, but is hindered by the severe resource constraints of edge hardware. This survey reviews efficient inference techniques for edge LLMs, with a focus on two key strategies of speculative decoding and model offloading. We categorize strategies into single-device and multi-device types, systematically analyzing the principles, recent advancements, implementations, and support within edge frameworks. Finally, we highlight the open challenges and future research directions that will advance the field of edge LLM inference.

KI fragen

Bookmark

View Full Paper