mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs
By: Chuan Xu , Qiaosheng Chen , Yutong Feng and more
Potential Business Impact:
Tests AI that uses pictures and words.
Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for enhancing the capabilities of large language models. However, existing RAG evaluation predominantly focuses on text retrieval and relies on opaque, end-to-end assessments of generated outputs. To address these limitations, we introduce mmRAG, a modular benchmark designed for evaluating multi-modal RAG systems. Our benchmark integrates queries from six diverse question-answering datasets spanning text, tables, and knowledge graphs, which we uniformly convert into retrievable documents. To enable direct, granular evaluation of individual RAG components -- such as the accuracy of retrieval and query routing -- beyond end-to-end generation quality, we follow standard information retrieval procedures to annotate document relevance and derive dataset relevance. We establish baseline performance by evaluating a wide range of RAG implementations on mmRAG.
Similar Papers
Benchmarking Retrieval-Augmented Generation for Chemistry
Computation and Language
Helps computers answer chemistry questions better.
MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework
Artificial Intelligence
Helps AI find better, truer answers.
Multimodal Iterative RAG for Knowledge Visual Question Answering
CV and Pattern Recognition
Helps computers answer harder questions using more information.