Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework
By: Ilias Driouich, Hongliu Cao, Eoin Thomas
Potential Business Impact:
Makes AI safer by hiding private info.
Retrieval-augmented generation (RAG) systems improve large language model outputs by incorporating external knowledge, enabling more informed and context-aware responses. However, the effectiveness and trustworthiness of these systems critically depends on how they are evaluated, particularly on whether the evaluation process captures real-world constraints like protecting sensitive information. While current evaluation efforts for RAG systems have primarily focused on the development of performance metrics, far less attention has been given to the design and quality of the underlying evaluation datasets, despite their pivotal role in enabling meaningful, reliable assessments. In this work, we introduce a novel multi-agent framework for generating synthetic QA datasets for RAG evaluation that prioritize semantic diversity and privacy preservation. Our approach involves: (1) a Diversity agent leveraging clustering techniques to maximize topical coverage and semantic variability, (2) a Privacy Agent that detects and mask sensitive information across multiple domains and (3) a QA curation agent that synthesizes private and diverse QA pairs suitable as ground truth for RAG evaluation. Extensive experiments demonstrate that our evaluation sets outperform baseline methods in diversity and achieve robust privacy masking on domain-specific datasets. This work offers a practical and ethically aligned pathway toward safer, more comprehensive RAG system evaluation, laying the foundation for future enhancements aligned with evolving AI regulations and compliance standards.
Similar Papers
LiveRAG: A diverse Q&A dataset with varying difficulty level for RAG evaluation
Computation and Language
Tests AI to answer questions better.
Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)
Cryptography and Security
Keeps private information safe when AI learns.
Can we Evaluate RAGs with Synthetic Data?
Computation and Language
Makes AI answer questions better, but not always.