Score: 1

AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators

Published: October 20, 2025 | arXiv ID: 2510.18897v1

By: Jacopo Tagliabue

Potential Business Impact:

AI helps design better computer systems faster.

Business Areas:

Simulation Software

We explore AI-driven distributed-systems policy design by combining stochastic code generation from large language models (LLMs) with deterministic verification in a domain-specific simulator. Using a Function-as-a-Service runtime (Bauplan) and its open-source simulator (Eudoxia) as a case study, we frame scheduler design as an iterative generate-and-verify loop: an LLM proposes a Python policy, the simulator evaluates it on standardized traces, and structured feedback steers subsequent generations. This setup preserves interpretability while enabling targeted search over a large design space. We detail the system architecture and report preliminary results on throughput improvements across multiple models. Beyond early gains, we discuss the limits of the current setup and outline next steps; in particular, we conjecture that AI will be crucial for scaling this methodology by helping to bootstrap new simulators.

Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM

Distributed, Parallel, and Cluster Computing

Makes supercomputers run AI faster for many people.

26 Nov 2025 1

88%

Simulating LLM training workloads for heterogeneous compute and network infrastructure

Distributed, Parallel, and Cluster Computing

Makes AI training faster on mixed computer parts.

7 Aug 2025 1

87%

Beyond Playtesting: A Generative Multi-Agent Simulation System for Massively Multiplayer Online Games

Artificial Intelligence

Makes game worlds act like real players.

2 Dec 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com github.com

Page Count

5 pages

AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators

AI helps design better computer systems faster.

Technical Abstract

Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM

Simulating LLM training workloads for heterogeneous compute and network infrastructure

Beyond Playtesting: A Generative Multi-Agent Simulation System for Massively Multiplayer Online Games