Large Language Model (LLM) agents execute complex tasks through sequences of tool calls, but existing frameworks process these calls sequentially. This paper introduces TACS (Tool-Aware Call Scheduler), a framework that models tool calls as operating system processes with latency estimates, priorities, and DAG-structured dependencies. We implement three scheduling strategies: FIFO Sequential (baseline), HEFT Parallel, and Deadline-Aware Priority, and evaluate them across 90 real-API runs spanning research, data analysis, and coding tasks. Results show that scheduling introduces overhead for low-latency tasks but provides measurable gains when tool latency exceeds a threshold (~1.5s). HEFT achieves up to 5.9% latency reduction in high-latency coding tasks. This work provides the first empirical characterization of scheduling overhead in LLM agent pipelines and introduces a practical threshold-based criterion for scheduler selection. Code: https://github.com/ekushal02/TACS
Kushal Erramilli (Tue,) studied this question.