CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
By: Jie He , Richard He Bai , Sinead Williamson and more
Potential Business Impact:
Makes AI smarter by connecting ideas better.
Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge but still suffers from long contexts and disjoint retrieval-generation optimization. In this work, we propose CLaRa (Continuous Latent Reasoning), a unified framework that performs embedding-based compression and joint optimization in a shared continuous space. To obtain semantically rich and retrievable compressed vectors, we introduce SCP, a key-preserving data synthesis framework using QA and paraphrase supervision. CLaRa then trains the reranker and generator end-to-end via a single language modeling loss, with gradients flowing through both modules using a differentiable top-k estimator. Theoretically, this unified optimization aligns retrieval relevance with answer quality. Experiments across multiple QA benchmarks show that CLaRa achieves state-of-the-art compression and reranking performance, often surpassing text-based fine-tuned baselines.
Similar Papers
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
Computation and Language
Makes AI smarter by finding and using better information.
CL-RAG: Bridging the Gap in Retrieval-Augmented Generation with Curriculum Learning
Computation and Language
Teaches AI to learn from easy to hard questions.
Probing Latent Knowledge Conflict for Faithful Retrieval-Augmented Generation
Computation and Language
Makes AI answers more truthful and less wrong.