Score: 1

Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation

Published: April 3, 2025 | arXiv ID: 2504.02411v1

By: Alexandre Misrahi , Nadezhda Chirkova , Maxime Louis and more

Potential Business Impact:

Makes AI smarter at answering questions from many topics.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Retrieval-Augmented Generation (RAG) enhances LLM factuality, but multi-domain applications face challenges like lack of diverse benchmarks and poor out-of-domain generalization. The first contribution of this work is to introduce a diverse benchmark comprising a variety of question-answering tasks from 8 sources and covering 13 domains. Our second contribution consists in systematically testing out-of-domain generalization for typical RAG tuning strategies. While our findings reveal that standard fine-tuning fails to generalize effectively, we show that sequence-level distillation with teacher-generated labels improves out-of-domain performance by providing more coherent supervision. Our findings highlight key strategies for improving multi-domain RAG robustness.

Country of Origin
🇨🇭 Switzerland

Repos / Data Links

Page Count
25 pages

Category
Computer Science:
Computation and Language