ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models
By: Jiani Guo , Zuchao Li , Jie Wu and more
Potential Business Impact:
Helps computers understand long stories better.
Large Language Models (LLMs), constrained by limited context windows, often face significant performance degradation when reasoning over long contexts. To address this, Retrieval-Augmented Generation (RAG) retrieves and reasons over chunks but frequently sacrifices logical coherence due to its reliance on similarity-based rankings. Similarly, divide-and-conquer frameworks (DCF) split documents into small chunks for independent reasoning and aggregation. While effective for local reasoning, DCF struggles to capture long-range dependencies and risks inducing conflicts by processing chunks in isolation. To overcome these limitations, we propose ToM, a novel Tree-oriented MapReduce framework for long-context reasoning. ToM leverages the inherent hierarchical structure of long documents (e.g., main headings and subheadings) by constructing a DocTree through hierarchical semantic parsing and performing bottom-up aggregation. Using a Tree MapReduce approach, ToM enables recursive reasoning: in the Map step, rationales are generated at child nodes; in the Reduce step, these rationales are aggregated across sibling nodes to resolve conflicts or reach consensus at parent nodes. Experimental results on 70B+ LLMs show that ToM significantly outperforms existing divide-and-conquer frameworks and retrieval-augmented generation methods, achieving better logical coherence and long-context reasoning. Our code is available at https://github.com/gjn12-31/ToM .
Similar Papers
Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning
Artificial Intelligence
Helps AI remember and use long stories.
MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems
Computation and Language
Computers understand books like people do.
Concept than Document: Context Compression via AMR-based Conceptual Entropy
Computation and Language
Makes AI understand long texts by removing extra words.