Nalar: An agent serving framework
By: Marco Laju , Donghyun Son , Saurabh Agarwal and more
Potential Business Impact:
Makes smart computer helpers run faster and better.
LLM-driven agentic applications increasingly automate complex, multi-step tasks, but serving them efficiently remains challenging due to heterogeneous components, dynamic and model-driven control flow, long-running state, and unpredictable latencies. Nalar is a ground-up agent-serving framework that cleanly separates workflow specification from execution while providing the runtime visibility and control needed for robust performance. Nalar preserves full Python expressiveness, using lightweight auto-generated stubs that turn agent and tool invocations into futures carrying dependency and context metadata. A managed state layer decouples logical state from physical placement, enabling safe reuse, migration, and consistent retry behavior. A two-level control architecture combines global policy computation with local event-driven enforcement to support adaptive routing, scheduling, and resource management across evolving workflows. Together, these mechanisms allow Nalar to deliver scalable, efficient, and policy-driven serving of heterogeneous agentic applications without burdening developers with orchestration logic. Across three agentic workloads, Nalar cuts tail latency by 34--74\%, achieves up to $2.9\times$ speedups, sustains 80 RPS where baselines fail, and scales to 130K futures with sub-500 ms control overhead.
Similar Papers
ALAS: A Stateful Multi-LLM Agent Framework for Disruption-Aware Planning
Artificial Intelligence
Helps computers plan and fix mistakes better.
ALAS: Transactional and Dynamic Multi-Agent LLM Planning
Multiagent Systems
Fixes AI plans when they make mistakes.
Software-Defined Agentic Serving
Distributed, Parallel, and Cluster Computing
Makes smart computer teams work faster and smarter.