Score: 0

Transforming Questions and Documents for Semantically Aligned Retrieval-Augmented Generation

Published: August 13, 2025 | arXiv ID: 2508.09755v1

By: Seokgi Lee

Potential Business Impact:

Answers hard questions by breaking them down.

We introduce a novel retrieval-augmented generation (RAG) framework tailored for multihop question answering. First, our system uses large language model (LLM) to decompose complex multihop questions into a sequence of single-hop subquestions that guide document retrieval. This decomposition mitigates the ambiguity inherent in multi-hop queries by clearly targeting distinct knowledge facets. Second, instead of embedding raw or chunked documents directly, we generate answerable questions from each document chunk using Qwen3-8B, embed these generated questions, and retrieve relevant chunks via question-question embedding similarity. During inference, the retrieved chunks are then fed along with the original question into the RAG pipeline. We evaluate on three multihop question datasets (MuSiQue, 2WikiMultiHopQa, HotpotQA) from LongBench. Our method improves RAG performacne compared to baseline systems. Our contributions highlight the benefits of using answerable-question embeddings for RAG, and the effectiveness of LLM-based query decomposition for multihop scenarios.

Question Decomposition for Retrieval-Augmented Generation

Computation and Language

Helps computers answer tricky questions from many sources.

1 Jul 2025 1

94%

Knowledge Compression via Question Generation: Enhancing Multihop Document Retrieval without Fine-tuning

Information Retrieval

Helps computers find answers by asking questions.

9 Jun 2025 1

94%

Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering

Computation and Language

Answers hard questions by breaking them down.

16 Jan 2026 3

View PDF Login to Bookmark

Country of Origin

🇰🇷 Korea, Republic of

Page Count

10 pages

Transforming Questions and Documents for Semantically Aligned Retrieval-Augmented Generation

Answers hard questions by breaking them down.

Technical Abstract

Question Decomposition for Retrieval-Augmented Generation

Knowledge Compression via Question Generation: Enhancing Multihop Document Retrieval without Fine-tuning

Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering