What type of study is this?

This is a Literature Review study.

September 17, 2025

Database Perspective on LLM Inference Systems

Key Points

Lowering inference costs is achievable by managing uncertain request lifecycles and optimizing hardware usage.
Key techniques include model execution, request processing, and advanced memory management strategies.
Distributed inference over multiple devices enhances the scalability of applications powered by large language models.
Diverse architectures integrate these techniques to meet specific application performance objectives effectively.

Abstract

Large language models (LLMs) are powering a new wave of language-based applications, including database applications, leading to new techniques and systems for dealing with the enormous compute and memory needs of LLMs, coupled with advances in computing hardware. In this tutorial, we review how these techniques lower inference costs by managing uncertain request lifecycles, exploiting specialized hardware, and scaling over distributed inference devices and machines. We present these techniques from the database perspective of request processing, model execution and optimization, and memory management. Following these discussion, we review how inference systems combine these techniques in diverse architectures to achieve application or performance objectives.

AIに質問

Bookmark

AIに質問

Bookmark

Database Perspective on LLM Inference Systems

Key Points

Abstract

Cite This Study