Towards Agentic OS: An LLM Agent Framework for Linux Schedulers
By: Yusheng Zheng , Yanpeng Hu , Wei Zhang and more
Potential Business Impact:
Makes computers run faster by learning what they need.
Operating system schedulers suffer from a fundamental semantic gap, where kernel policies fail to understand application-specific needs, leading to suboptimal performance. We introduce SchedCP, the first framework that enables fully autonomous Large Language Model (LLM) agents to safely and efficiently optimize Linux schedulers without human involvement. Our core insight is that the challenge is not merely to apply a better LLM, but to architect a decoupled control plane that separates the AI's role of semantic reasoning ("what to optimize") from the system's role of execution ("how to observe and act"), thereby separating the optimization problem into two stages: goal-inference and policy-synthesis. Implemented as Model Context Protocol(MCP) server, SchedCP provides a stable interface with three key services: a Workload Analysis Engine, an evolving Scheduler Policy Repository, and an Execution Verifier that validates all AI-generated code and configure before deployment with static and dynamic analysis. We demonstrate this architecture's power with sched-agent, a multi-agent system that autonomously analyzes workloads, synthesizes custom eBPF scheduling policies, and deploys them via the sched\_ext infrastructure. Our evaluation shows that SchedCP achieves up to an 1.79x performance improvement, and a 13x cost reduction compared to naive agentic approaches, all while maintaining high success rate. By bridging the semantic gap, SchedCP democratizes expert-level system optimization and represents a step towards creating truly self-optimizing, application-aware operating systems. The code is open-sourced in https://github.com/eunomia-bpf/schedcp
Similar Papers
Towards Agentic OS: An LLM Agent Framework for Linux Schedulers
Artificial Intelligence
Makes computers run programs much faster.
Experiences with Model Context Protocol Servers for Science and High Performance Computing
Distributed, Parallel, and Cluster Computing
Lets computers plan and do science experiments.
Learning Virtual Machine Scheduling in Cloud Computing through Language Agents
Machine Learning (CS)
Helps computers pack more tasks into cloud servers.