Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data
By: Xuanming Zhang , Shwan Ashrafi , Aziza Mirsaidova and more
Potential Business Impact:
Helps AI make better choices faster.
We study the reasoning behavior of large language models (LLMs) under limited computation budgets. In such settings, producing useful partial solutions quickly is often more practical than exhaustive reasoning, which incurs high inference costs. Many real-world tasks, such as trip planning, require models to deliver the best possible output within a fixed reasoning budget. We introduce an anytime reasoning framework and the Anytime Index, a metric that quantifies how effectively solution quality improves as reasoning tokens increase. To further enhance efficiency, we propose an inference-time self-improvement method using LLM-synthesized preference data, where models learn from their own reasoning comparisons to produce better intermediate solutions. Experiments on NaturalPlan (Trip), AIME, and GPQA datasets show consistent gains across Grok-3, GPT-oss, GPT-4.1/4o, and LLaMA models, improving both reasoning quality and efficiency under budget constraints.
Similar Papers
LLMs for Resource Allocation: A Participatory Budgeting Approach to Inferring Preferences
Artificial Intelligence
Helps computers fairly share money for projects.
Reasoning with Preference Constraints: A Benchmark for Language Models in Many-to-One Matching Markets
Artificial Intelligence
Helps computers match students to colleges fairly.
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Machine Learning (CS)
Makes AI think better with less effort.