Score: 1

Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study

Published: June 5, 2025 | arXiv ID: 2506.04810v1

By: Yujun Zhou , Jiayi Ye , Zipeng Ling and more

Potential Business Impact:

Teaches computers to think step-by-step.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Logical reasoning is a core capability for many applications of large language models (LLMs), yet existing benchmarks often rely solely on final-answer accuracy, failing to capture the quality and structure of the reasoning process. We propose FineLogic, a fine-grained evaluation framework that assesses logical reasoning across three dimensions: overall benchmark accuracy, stepwise soundness, and representation-level alignment. In addition, to better understand how reasoning capabilities emerge, we conduct a comprehensive study on the effects of supervision format during fine-tuning. We construct four supervision styles (one natural language and three symbolic variants) and train LLMs under each. Our findings reveal that natural language supervision yields strong generalization even on out-of-distribution and long-context tasks, while symbolic reasoning styles promote more structurally sound and atomic inference chains. Further, our representation-level probing shows that fine-tuning primarily improves reasoning behaviors through step-by-step generation, rather than enhancing shortcut prediction or internalized correctness. Together, our framework and analysis provide a more rigorous and interpretable lens for evaluating and improving logical reasoning in LLMs.

Logical Reasoning in Large Language Models: A Survey

Artificial Intelligence

Makes AI better at solving puzzles and thinking logically.

13 Feb 2025 0

91%

Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach

Machine Learning (CS)

Makes AI better at solving math problems.

13 Mar 2025 0

91%

LogReasoner: Empowering LLMs with Expert-like Coarse-to-Fine Reasoning for Log Analysis Tasks

Artificial Intelligence

Helps computers find computer problems like experts.

25 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

26 pages

Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study

Teaches computers to think step-by-step.

Technical Abstract

Logical Reasoning in Large Language Models: A Survey

Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach

LogReasoner: Empowering LLMs with Expert-like Coarse-to-Fine Reasoning for Log Analysis Tasks