Human-Level Reasoning: A Comparative Study of Large Language Models on Logical and Abstract Reasoning
By: Benjamin Grando Moreira
Potential Business Impact:
Tests if AI can think like a person.
Evaluating reasoning ability in Large Language Models (LLMs) is important for advancing artificial intelligence, as it transcends mere linguistic task performance. It involves understanding whether these models truly understand information, perform inferences, and are able to draw conclusions in a logical and valid way. This study compare logical and abstract reasoning skills of several LLMs - including GPT, Claude, DeepSeek, Gemini, Grok, Llama, Mistral, Perplexity, and Sabi\'a - using a set of eight custom-designed reasoning questions. The LLM results are benchmarked against human performance on the same tasks, revealing significant differences and indicating areas where LLMs struggle with deduction.
Similar Papers
Logical Reasoning in Large Language Models: A Survey
Artificial Intelligence
Makes AI better at solving puzzles and thinking logically.
Thinking Machines: A Survey of LLM based Reasoning Strategies
Computation and Language
Makes AI think better to solve hard problems.
Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?
Computation and Language
Makes computers better at solving logic puzzles.