RepoShapley: Shapley-Enhanced Context Filtering for Repository-Level Code Completion
By: Yu Huo , Siyu Zhang , Kun Zeng and more
Potential Business Impact:
Helps computers write better code by picking the best clues.
Repository-level code completion benefits from retrieval-augmented generation (RAG). However, controlling cross-file evidence is difficult because chunk utility is often interaction-dependent: some snippets help only when paired with complementary context, while others harm decoding when they conflict. We propose RepoShapley, a coalition-aware context filtering framework supervised by Shapley-style marginal contributions. Our module ChunkShapley constructs offline labels by (i) single-chunk probing with teacher-forced likelihood to estimate signed, weighted effects, (ii) a surrogate game that captures saturation and interference, (iii) exact Shapley computation for small retrieval sets, and (iv) bounded post-verification that selects a decoding-optimal coalition using the frozen generator. We distill verified $KEEP$ or $DROP$ decisions and retrieval triggering into a single model via discrete control tokens. Experiments across benchmarks and backbones show that RepoShapley improves completion quality while reducing harmful context and unnecessary retrieval. Code: https://anonymous.4open.science/r/a7f3c9.
Similar Papers
Impact-driven Context Filtering For Cross-file Code Completion
Software Engineering
Helps computers write better code by picking good examples.
Relative Positioning Based Code Chunking Method For Rich Context Retrieval In Repository Level Code Completion Task With Code Language Model
Software Engineering
Helps computers finish writing code faster.
Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets
Machine Learning (CS)
Helps computers write better code faster.