Score: 0

Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion

Published: October 28, 2025 | arXiv ID: 2510.24390v1

By: Xianjun Gao , Jianchun Liu , Hongli Xu and more

Potential Business Impact:

Makes AI answer questions much faster and smarter.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The integration of Large Language Models (LLMs) into real-time Web applications, such as AI-powered search and conversational agents, presents a fundamental Web infrastructure challenge: reconciling the demand for high-quality, complex reasoning with the stringent low-latency and high-throughput requirements of interactive services. Current LLM reasoning, hindered by computationally inefficient sequential generation and rigid reasoning strategies, creates a critical bottleneck for the Web services. Existing approaches typically optimize the LLM reasoning for either efficiency or quality but struggle to achieve both, and thus fail to meet the dual requirements of modern Web platforms. To overcome these limitations, we propose Orion, a novel and efficient reasoning framework that enables dependency-aware query decomposition and logic-parallel content expansion. Concretely, Orion decomposes a single query reasoning process into two synergistic phases: (1) \textit{key point generation}, which distills logically structured key points through retrieval-augmented few-shot prompting, and (2) \textit{content parallel expansion}, which concurrently elaborates on these points based on a dependency graph to ensure logical consistency. Furthermore, Orion introduces a pipeline scheduling mechanism that exploits the complementary computational characteristics of the two phases (generation imposes pressure on GPU computing and expansion stresses on GPU memory) across multiple queries, enabling cross-query parallelism and dramatically improving reasoning performance (\ie, efficiency and quality). Experiments on diverse benchmarks show that Orion not only delivers up to 4.33x higher token generation speed and 3.42x lower answer latency over the baselines but also improves reasoning quality by up to 18.75% through explicitly modeling inter-point dependencies.

ORION: Teaching Language Models to Reason Efficiently in the Language of Thought

Artificial Intelligence

Makes computers think faster and cheaper.

28 Nov 2025 2

91%

Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models

Artificial Intelligence

Teaches small computers to find information better.

10 Nov 2025 1

90%

From Query to Logic: Ontology-Driven Multi-Hop Reasoning in LLMs

Computation and Language

Helps computers answer tricky questions by thinking step-by-step.

2 Aug 2025 1

View PDF Login to Bookmark

Page Count

9 pages

Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion

Makes AI answer questions much faster and smarter.

Technical Abstract

ORION: Teaching Language Models to Reason Efficiently in the Language of Thought

Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models

From Query to Logic: Ontology-Driven Multi-Hop Reasoning in LLMs