Score: 1

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

Published: October 6, 2025 | arXiv ID: 2510.04905v1

By: Yicheng Tao, Yao Qin, Yepang Liu

Potential Business Impact:

Helps computers write complex software code.

Business Areas:
Augmented Reality Hardware, Software

Recent advancements in large language models (LLMs) have substantially improved automated code generation. While function-level and file-level generation have achieved promising results, real-world software development typically requires reasoning across entire repositories. This gives rise to the challenging task of Repository-Level Code Generation (RLCG), where models must capture long-range dependencies, ensure global semantic consistency, and generate coherent code spanning multiple files or modules. To address these challenges, Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm that integrates external retrieval mechanisms with LLMs, enhancing context-awareness and scalability. In this survey, we provide a comprehensive review of research on Retrieval-Augmented Code Generation (RACG), with an emphasis on repository-level approaches. We categorize existing work along several dimensions, including generation strategies, retrieval modalities, model architectures, training paradigms, and evaluation protocols. Furthermore, we summarize widely used datasets and benchmarks, analyze current limitations, and outline key challenges and opportunities for future research. Our goal is to establish a unified analytical framework for understanding this rapidly evolving field and to inspire continued progress in AI-powered software engineering.

Country of Origin
🇺🇸 United States

Page Count
38 pages

Category
Computer Science:
Software Engineering