Score: 1

LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure

Published: November 10, 2025 | arXiv ID: 2511.07229v1

By: Jaehong Cho, Hyunmin Choi, Jongse Park

Potential Business Impact:

Tests how to make AI answer questions faster.

Business Areas:

Simulation Software

This paper introduces LLMServingSim2.0, a system simulator designed for exploring heterogeneous hardware in large-scale LLM serving systems. LLMServingSim2.0 addresses two key limitations of its predecessor: (1) integrating hardware models into system-level simulators is non-trivial due to the lack of a clear abstraction, and (2) existing simulators support only a narrow subset of serving techniques, leaving no infrastructure that captures the breadth of approaches in modern LLM serving. To overcome these issues, LLMServingSim2.0 adopts trace-driven performance modeling, accompanied by an operator-level latency profiler, enabling the integration of new accelerators with a single command. It further embeds up-to-date serving techniques while exposing flexible interfaces for request routing, cache management, and scheduling policies. In a TPU case study, our profiler requires 18.5x fewer LoC and outperforms the predecessor's hardware-simulator integration, demonstrating LLMServingSim2.0's low-effort hardware extensibility. Our experiments further show that LLMServingSim2.0 reproduces GPU-based LLM serving with 1.9% error, while maintaining practical simulation time, making it a comprehensive platform for both hardware developers and LLM service providers.

Simulating LLM training workloads for heterogeneous compute and network infrastructure

Distributed, Parallel, and Cluster Computing

Makes AI training faster on mixed computer parts.

7 Aug 2025 1

88%

TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems

Distributed, Parallel, and Cluster Computing

Makes AI answer questions much faster and cheaper.

11 Mar 2025 2

87%

From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs

Hardware Architecture

Makes AI understand faster on special chips.

7 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇰🇷 Korea, Republic of

Repos / Data Links

github.com

Page Count

4 pages

LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure

Tests how to make AI answer questions faster.

Technical Abstract

Simulating LLM training workloads for heterogeneous compute and network infrastructure

TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems

From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs