DETAIL Matters: Measuring the Impact of Prompt Specificity on Reasoning in Large Language Models
By: Olivia Kim
Potential Business Impact:
Makes AI smarter by giving it clearer instructions.
Prompt design plays a critical role in the reasoning performance of large language models (LLMs), yet the impact of prompt specificity - how detailed or vague a prompt is - remains understudied. This paper introduces DETAIL, a framework for evaluating LLM performance across varying levels of prompt specificity. We generate multi-level prompts using GPT-4, quantify specificity via perplexity, and assess correctness using GPT-based semantic equivalence. Experiments on 30 novel reasoning tasks across GPT-4 and O3-mini reveal that specificity improves accuracy, especially for smaller models and procedural tasks. Our results highlight the need for adaptive prompting strategies and provide tools and data to support further research.
Similar Papers
Prompt Engineering: How Prompt Vocabulary affects Domain Knowledge
Computation and Language
Makes AI answer science and law questions better.
Why Prompt Design Matters and Works: A Complexity Analysis of Prompt Search Space in LLMs
Computation and Language
Finds better ways for computers to think.
Dissecting Clinical Reasoning in Language Models: A Comparative Study of Prompts and Model Adaptation Strategies
Computation and Language
Helps doctors understand patient notes better.