LLavaCode: Compressed Code Representations for Retrieval-Augmented Code Generation
By: Daria Cherniuk , Nikita Sukhorukov , Nikita Sushko and more
Potential Business Impact:
Makes code writing faster and smarter.
Retrieval-augmented generation has emerged as one of the most effective approaches for code completion, particularly when context from a surrounding repository is essential. However, incorporating context significantly extends sequence length, leading to slower inference - a critical limitation for interactive settings such as IDEs. In this work, we introduce LlavaCode, a framework that compresses code into compact, semantically rich representations interpretable by code LLM, enhancing generation quality while reducing the retrieved context to only a few compressed single-token vectors. Using a small projector module we can significantly increase the EM and ES metrics of coding model with negligible latency increase. Our experiments demonstrate that compressed context enables 20-38% reduction in Time-to-First-Token (TTFT) on line completion tasks compared to full-RAG pipelines.
Similar Papers
ReCode: Improving LLM-based Code Repair with Fine-Grained Retrieval-Augmented Generation
Software Engineering
Fixes computer code faster and cheaper.
LongCodeZip: Compress Long Context for Code Language Models
Computation and Language
Makes computer programs understand more code faster.
Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding
Software Engineering
Helps computers write better code by understanding its structure.