Score: 0

Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance

Published: April 15, 2025 | arXiv ID: 2504.11197v2

By: Shangyu Liu , Zhenzhe Zheng , Xiaoyao Huang and more

Potential Business Impact:

Helps small AI learn from cloud and phone data.

Business Areas:

Augmented Reality Hardware, Software

Small language models (SLMs) support efficient deployments on resource-constrained edge devices, but their limited capacity compromises inference performance. Retrieval-augmented generation (RAG) is a promising solution to enhance model performance by integrating external databases, without requiring intensive on-device model retraining. However, large-scale public databases and user-specific private contextual documents are typically located on the cloud and the device separately, while existing RAG implementations are primarily centralized. To bridge this gap, we propose DRAGON, a distributed RAG framework to enhance on-device SLMs through both general and personal knowledge without the risk of leaking document privacy. Specifically, DRAGON decomposes multi-document RAG into multiple parallel token generation processes performed independently and locally on the cloud and the device, and employs a newly designed Speculative Aggregation, a dual-side speculative algorithm to avoid frequent output synchronization between the cloud and device. A new scheduling algorithm is further introduced to identify the optimal aggregation side based on real-time network conditions. Evaluations on real-world hardware testbed demonstrate a significant performance improvement of DRAGON-up to 1.9x greater gains over standalone SLM compared to the centralized RAG, substantial reduction in per-token latency, and negligible Time to First Token (TTFT) overhead.

Distributed Retrieval-Augmented Generation

Distributed, Parallel, and Cluster Computing

Shares private health info safely for better AI.

1 May 2025 1

93%

DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems

Artificial Intelligence

Lets AI answer questions using local, private info.

26 May 2025 0

93%

A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models

Computation and Language

Helps AI understand complex topics by connecting information.

21 Jan 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

13 pages

Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance

Helps small AI learn from cloud and phone data.

Technical Abstract

Distributed Retrieval-Augmented Generation

DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems

A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models