Score: 0

DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

Published: November 26, 2025 | arXiv ID: 2511.21669v1

By: Fengze Yu , Leshu Li , Brad McDanel and more

Potential Business Impact:

Makes AI talk faster on many devices.

Business Areas:

Cloud Computing Internet Services, Software

Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments. Existing speculative decoding (SD) techniques accelerate token generation but remain confined to single-node execution. We propose DSD, a distributed speculative decoding framework that extends SD to multi-device deployments through coordinated draft-target execution. Given the lack of prior work on simulating this paradigm, we first introduce DSD-Sim, a discrete-event simulator that captures network, batching, and scheduling dynamics. Building on insights from DSD-Sim, we further design an Adaptive Window Control (AWC) policy that dynamically adjusts speculation window size to optimize throughput. Experiments across diverse workloads show that DSD achieves up to 1.1x speedup and 9.7% higher throughput over existing SD baselines, enabling agile and scalable LLM serving across edge and cloud.

DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

Machine Learning (CS)

Makes AI talk faster on many devices.

26 Nov 2025 0

93%

Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput

Distributed, Parallel, and Cluster Computing

Makes AI talk faster when shared.

13 Nov 2025 0

92%

AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference

Computation and Language

Makes AI talk faster without losing its smarts.

12 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

13 pages

DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

Makes AI talk faster on many devices.

Technical Abstract

DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput

AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference