Score: 0

Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model Reasoning

Published: March 25, 2025 | arXiv ID: 2505.00001v2

By: Shaun Baek , Shaun Esua-Mensah , Cyrus Tsui and more

Potential Business Impact:

Teaches computers to think logically and solve problems.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Models (LLMs) are primarily trained on high-resource natural languages, limiting their effectiveness in low-resource settings and in tasks requiring deep logical reasoning. This research introduces Rosetta-PL, a benchmark designed to evaluate LLMs' logical reasoning and generalization capabilities in a controlled environment. We construct Rosetta-PL by translating a dataset of logical propositions from Lean into a custom logical language, which is then used to fine-tune an LLM (e.g., GPT-4o). Our experiments analyze the impact of the size of the dataset and the translation methodology on the performance of the model. Our results indicate that preserving logical relationships in the translation process significantly boosts precision, with accuracy plateauing beyond roughly 20,000 training samples. These insights provide valuable guidelines for optimizing LLM training in formal reasoning tasks and improving performance in various low-resource language applications.

Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework

Machine Learning (CS)

Teaches computers to prove math problems correctly.

28 Apr 2025 0

90%

LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs

Artificial Intelligence

Tests how well computers can plan and think logically.

12 Jun 2025 2

90%

Reasoning Capabilities and Invariability of Large Language Models

Computation and Language

Tests if computers can think logically.

1 May 2025 1

View PDF Login to Bookmark

Page Count

12 pages

Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model Reasoning

Teaches computers to think logically and solve problems.

Technical Abstract

Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework

LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs

Reasoning Capabilities and Invariability of Large Language Models