Score: 3

What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond

Published: March 26, 2025 | arXiv ID: 2503.20589v1

By: Wenchao Gu , Juntao Chen , Yanlin Wang and more

BigTech Affiliations: Huawei

Potential Business Impact:

Helps computers write better code by finding good examples.

Business Areas:

Semantic Search Internet Services

Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. While retrieval-augmented generation (RAG) frameworks are widely adopted, the effectiveness of different retrieved information sources-contextual code, APIs, and similar snippets-has not been rigorously analyzed. Through an empirical study on two benchmarks, we demonstrate that in-context code and potential API information significantly enhance LLM performance, whereas retrieved similar code often introduces noise, degrading results by up to 15%. Based on the preliminary results, we propose AllianceCoder, a novel context-integrated method that employs chain-of-thought prompting to decompose user queries into implementation steps and retrieves APIs via semantic description matching. Through extensive experiments on CoderEval and RepoExec, AllianceCoder achieves state-of-the-art performance, improving Pass@1 by up to 20% over existing approaches.

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

Software Engineering

Helps computers write complex software code.

6 Oct 2025 1

91%

ReCode: Improving LLM-based Code Repair with Fine-Grained Retrieval-Augmented Generation

Software Engineering

Fixes computer code faster and cheaper.

2 Sep 2025 0

91%

CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation

Software Engineering

Helps computers write complex code by finding examples.

14 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇩🇪 Germany, China

Page Count

12 pages

What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond

Helps computers write better code by finding good examples.

Technical Abstract

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

ReCode: Improving LLM-based Code Repair with Fine-Grained Retrieval-Augmented Generation

CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation