Score: 0

CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation

Published: April 14, 2025 | arXiv ID: 2504.10046v1

By: Jia Li , Xianjie Shi , Kechi Zhang and more

Potential Business Impact:

Helps computers write complex code by finding examples.

Business Areas:

Augmented Reality Hardware, Software

Large language models (LLMs) have shown promising performance in automated code generation, especially excelling in simple tasks such as generating standalone codes. Different from simple tasks, real-world code generation usually depends on specific programming environment (e.g., code repositories). It contains complex dependencies and domain knowledge, which is needed for LLMs when generating target code snippets. In this paper, we propose CodeRAG, a retrieval-augmented code generation (RAG) framework to comprehensively retrieve supportive codes for real-world code generation. Beginning with the requirement, CodeRAG first constructs a requirement graph for the current repository, and retrieves sub- and similar- requirement nodes of the target requirement on the graph. Meanwhile, it models the repository into a DS-code graph. CodeRAG then maps these relevant requirement nodes into their corresponding code nodes, and treats these code nodes as archors for LLM reasoning on DS-code graph. Finally, CodeRAG introduces a code-oriented agentic reasoning process, seamlessly allowing LLMs to reason and comprehensively retrieve for supportive codes which LLMs' need for generating correct programs. Experiments show that CodeRAG achieves significant improvements (i.e., increasing 40.90 and 37.79 Pass@1 on GPT-4o and Gemini-Pro on DevEval) compared to no RAG scenarios. Further tests on reasoning LLMs (i.e., QwQ-32B) confirm CodeRAG's adaptability and efficacy across various types of LLMs. In addition, CodeRAG outperforms commercial programming products such as Copilit and Cursor. We further investigate the performance of our framework on different dependency types, and observe that CodeRAG is superior in generating examples where target codes invoke predefined cross-file code snippets. These results demonstrate CodeRAG's potential in solving real-world repo-level coding challenges.

CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code Completion

Computation and Language

Helps computers write code faster and better.

19 Sep 2025 2

92%

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

Software Engineering

Helps computers write complex software code.

6 Oct 2025 1

92%

A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models

Computation and Language

Helps computers understand complex topics better.

21 Jan 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

14 pages

CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation

Helps computers write complex code by finding examples.

Technical Abstract

CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code Completion

Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches

A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models