SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression
By: Yiqiao Jin , Kartik Sharma , Vineeth Rakesh and more
Potential Business Impact:
Makes AI smarter by using more information better.
Retrieval-augmented Generation (RAG) extends large language models (LLMs) with external knowledge but faces key challenges: restricted effective context length and redundancy in retrieved documents. Pure compression-based approaches reduce input size but often discard fine-grained details essential for factual accuracy. We propose SARA, a unified RAG framework that balances local precision and global knowledge coverage under tight context budgets. SARA combines natural-language text snippets with semantic compression vectors to jointly enhance context efficiency and answer correctness. It represents contexts at two complementary levels: 1) fine-grained natural-language spans that preserve critical entities and numerical values, and 2) compact, interpretable vectors that summarize high-level semantics. An iterative evidence-selection module employs the compression vectors for dynamic reranking of contexts. Across 9 datasets and 5 open-source LLMs spanning 3 model families (Mistral, Llama, and Gemma), SARA consistently improves answer relevance (+17.71), answer correctness (+13.72), and semantic similarity (+15.53), demonstrating the importance of integrating textual and compressed representations for robust, context-efficient RAG.
Similar Papers
Enhancing RAG Efficiency with Adaptive Context Compression
Computation and Language
Makes AI answer questions faster and smarter.
SAGE: A Framework of Precise Retrieval for RAG
Machine Learning (CS)
Helps computers answer questions more accurately.
ParetoRAG: Leveraging Sentence-Context Attention for Robust and Efficient Retrieval-Augmented Generation
Computation and Language
Makes AI smarter by finding better information.