Score: 0

BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions

Published: June 6, 2025 | arXiv ID: 2506.05766v1

By: Saptarshi Sengupta , Shuhua Yang , Paul Kwong Yu and more

Potential Business Impact:

Helps computers answer questions using text and images.

Business Areas:

Bioinformatics Biotechnology, Data and Analytics, Science and Engineering

Retrieval augmented generation (RAG) has shown great power in improving Large Language Models (LLMs). However, most existing RAG-based LLMs are dedicated to retrieving single modality information, mainly text; while for many real-world problems, such as healthcare, information relevant to queries can manifest in various modalities such as knowledge graph, text (clinical notes), and complex molecular structure. Thus, being able to retrieve relevant multi-modality domain-specific information, and reason and synthesize diverse knowledge to generate an accurate response is important. To address the gap, we present BioMol-MQA, a new question-answering (QA) dataset on polypharmacy, which is composed of two parts (i) a multimodal knowledge graph (KG) with text and molecular structure for information retrieval; and (ii) challenging questions that designed to test LLM capabilities in retrieving and reasoning over multimodal KG to answer questions. Our benchmarks indicate that existing LLMs struggle to answer these questions and do well only when given the necessary background data, signaling the necessity for strong RAG frameworks.

MedBioRAG: Semantic Search and Retrieval-Augmented Generation with Large Language Models for Medical and Biological QA

Computation and Language

Helps doctors answer hard medical questions better.

10 Dec 2025 1

90%

MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation

Computation and Language

Helps doctors answer hard medical questions better.

5 Feb 2025 0

90%

Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering

Information Retrieval

Helps computers understand documents with text and pictures.

22 May 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

31 pages

BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions

Helps computers answer questions using text and images.

Technical Abstract

MedBioRAG: Semantic Search and Retrieval-Augmented Generation with Large Language Models for Medical and Biological QA

MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation

Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering