What question did this study set out to answer?

To characterize scheduling overhead in LLM agent pipelines and improve performance through a novel framework.

March 19, 2026Open Access

Scheduling LLM Tool Calls as OS Processes: Empirical Characterization of Overhead and Topology-Dependent Performance in Agentic Pipelines

Key Points

To characterize scheduling overhead in LLM agent pipelines and improve performance through a novel framework.
Introduced the Tool-Aware Call Scheduler (TACS) framework.
Modeled tool calls as operating system processes with latency and priority considerations.
Implemented three scheduling strategies: FIFO Sequential, HEFT Parallel, and Deadline-Aware Priority.
Scheduling increases overhead for low-latency tasks.
Measurable performance gains observed when tool latency exceeds ~1.5s.
HEFT Parallel scheduling achieved up to 5.9% latency reduction in high-latency coding tasks.

Abstract

Large Language Model (LLM) agents execute complex tasks through sequences of tool calls, but existing frameworks process these calls sequentially. This paper introduces TACS (Tool-Aware Call Scheduler), a framework that models tool calls as operating system processes with latency estimates, priorities, and DAG-structured dependencies. We implement three scheduling strategies: FIFO Sequential (baseline), HEFT Parallel, and Deadline-Aware Priority, and evaluate them across 90 real-API runs spanning research, data analysis, and coding tasks. Results show that scheduling introduces overhead for low-latency tasks but provides measurable gains when tool latency exceeds a threshold (~1.5s). HEFT achieves up to 5.9% latency reduction in high-latency coding tasks. This work provides the first empirical characterization of scheduling overhead in LLM agent pipelines and introduces a practical threshold-based criterion for scheduler selection. Code: https://github.com/ekushal02/TACS

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper