Score: 1

BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents

Published: October 27, 2025 | arXiv ID: 2510.23458v2

By: Litu Ou , Kuan Li , Huifeng Yin and more

Potential Business Impact:

Lets AI know when its answers are good.

Business Areas:

Semantic Search Internet Services

Confidence in LLMs is a useful indicator of model uncertainty and answer reliability. Existing work mainly focused on single-turn scenarios, while research on confidence in complex multi-turn interactions is limited. In this paper, we investigate whether LLM-based search agents have the ability to communicate their own confidence through verbalized confidence scores after long sequences of actions, a significantly more challenging task compared to outputting confidence in a single interaction. Experimenting on open-source agentic models, we first find that models exhibit much higher task accuracy at high confidence while having near-zero accuracy when confidence is low. Based on this observation, we propose Test-Time Scaling (TTS) methods that use confidence scores to determine answer quality, encourage the model to try again until reaching a satisfactory confidence level. Results show that our proposed methods significantly reduce token consumption while demonstrating competitive performance compared to baseline fixed budget TTS methods.