Score: 0

EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models

Published: November 13, 2025 | arXiv ID: 2511.10201v1

By: Junquan Huang , Haotian Wu , Yubo Gao and more

Potential Business Impact:

Makes AI explain things shorter and smarter.

Business Areas:

Semantic Search Internet Services

Large language models (LLMs) with Chain-of-Thought (CoT) prompting achieve strong reasoning but often produce unnecessarily long explanations, increasing cost and sometimes reducing accuracy. Fair comparison of efficiency-oriented approaches is hindered by fragmented evaluation practices. We introduce EffiReason-Bench, a unified benchmark for rigorous cross-paradigm evaluation of efficient reasoning methods across three categories: Reasoning Blueprints, Dynamic Execution, and Post-hoc Refinement. To enable step-by-step evaluation, we construct verified CoT annotations for CommonsenseQA and LogiQA via a pipeline that enforces standardized reasoning structures, comprehensive option-wise analysis, and human verification. We evaluate 7 methods across 6 open-source LLMs (1B-70B) on 4 datasets spanning mathematics, commonsense, and logic, and propose the E3-Score, a principled metric inspired by economic trade-off modeling that provides smooth, stable evaluation without discontinuities or heavy reliance on heuristics. Experiments show that no single method universally dominates; optimal strategies depend on backbone scale, task complexity, and architecture.

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning

Artificial Intelligence

Tests if AI's thinking is reliable.

8 Dec 2025 1

90%

Mechanics of Learned Reasoning 1: TempoBench, A Benchmark for Interpretable Deconstruction of Reasoning System Performance

Artificial Intelligence

Tests how well computers can think step-by-step.

31 Oct 2025 2

90%

Reasoning Efficiently Through Adaptive Chain-of-Thought Compression: A Self-Optimizing Framework

Software Engineering

Makes smart computer programs think faster and better.

17 Sep 2025 0

View PDF Login to Bookmark

Page Count

11 pages

EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models

Makes AI explain things shorter and smarter.

Technical Abstract

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning

Mechanics of Learned Reasoning 1: TempoBench, A Benchmark for Interpretable Deconstruction of Reasoning System Performance

Reasoning Efficiently Through Adaptive Chain-of-Thought Compression: A Self-Optimizing Framework