Score: 2

HoarePrompt: Structural Reasoning About Program Correctness in Natural Language

Published: March 25, 2025 | arXiv ID: 2503.19599v2

By: Dimitrios Stamatios Bouras , Yihan Dai , Tairan Wang and more

Potential Business Impact:

Checks if computer code follows instructions.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

While software requirements are often expressed in natural language, verifying the correctness of a program against such requirements is a hard and underexplored problem. Large language models (LLMs) are promising candidates for addressing this challenge, however our experience shows that they are ineffective in this task, often failing to detect even straightforward bugs. To address this gap, we introduce HoarePrompt, a novel approach that adapts fundamental ideas from program verification to natural language artifacts. Inspired from the strongest postcondition calculus, HoarePrompt employs a systematic, step-by-step process in which an LLM generates natural language descriptions of reachable program states at various code points. To manage loops, we propose few-shot-driven k-induction, an adaptation of the k-induction method widely used in model checking. Once program states are described, HoarePrompt leverages the LLM to assess whether the program, annotated with these state descriptions, conforms to the natural language requirements. For evaluating the quality of classifiers of program correctness with respect to natural language requirements, we constructed CoCoClaNeL, a challenging dataset of solutions to programming competition problems. Our experiments show that HoarePrompt improves the MCC by 61% compared to directly using Zero-shot-CoT prompts for correctness classification. Furthermore, HoarePrompt outperforms a classifier that assesses correctness via LLM-based test generation by an MCC increase of 106%. The inductive reasoning mechanism contributes a 26% boost to MCC, underscoring its effectiveness in managing loops.

Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers

Artificial Intelligence

Makes AI better at solving hard science problems.

2 May 2025 0

87%

Are Prompts All You Need? Evaluating Prompt-Based Large Language Models (LLM)s for Software Requirements Classification

Software Engineering

Helps computers sort software ideas faster, needing less data.

17 Sep 2025 2

87%

Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance

Computation and Language

Makes AI think better even with short questions.

13 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇬🇧 United Kingdom, China

Repos / Data Links

github.com

Page Count

43 pages

HoarePrompt: Structural Reasoning About Program Correctness in Natural Language

Checks if computer code follows instructions.

Technical Abstract

Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers

Are Prompts All You Need? Evaluating Prompt-Based Large Language Models (LLM)s for Software Requirements Classification

Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance