Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation
By: Alexandre Misrahi , Nadezhda Chirkova , Maxime Louis and more
Potential Business Impact:
Makes AI smarter at answering questions from many topics.
Retrieval-Augmented Generation (RAG) enhances LLM factuality, but multi-domain applications face challenges like lack of diverse benchmarks and poor out-of-domain generalization. The first contribution of this work is to introduce a diverse benchmark comprising a variety of question-answering tasks from 8 sources and covering 13 domains. Our second contribution consists in systematically testing out-of-domain generalization for typical RAG tuning strategies. While our findings reveal that standard fine-tuning fails to generalize effectively, we show that sequence-level distillation with teacher-generated labels improves out-of-domain performance by providing more coherent supervision. Our findings highlight key strategies for improving multi-domain RAG robustness.
Similar Papers
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Computation and Language
One smart tool helps computers answer many questions.
Investigating the Robustness of Retrieval-Augmented Generation at the Query Level
Computation and Language
Makes AI smarter by improving how it finds answers.
Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG
Computation and Language
Teaches AI to use facts better, even when wrong.