Score: 0

TravelBench: A Real-World Benchmark for Multi-Turn and Tool-Augmented Travel Planning

Published: December 27, 2025 | arXiv ID: 2512.22673v1

By: Xiang Cheng , Yulan Hu , Xiangwen Zhang and more

Large language model (LLM) agents have demonstrated strong capabilities in planning and tool use. Travel planning provides a natural and high-impact testbed for these capabilities, as it requires multi-step reasoning, iterative preference elicitation through interaction, and calls to external tools under evolving constraints. Prior work has studied LLMs on travel-planning tasks, but existing settings are limited in domain coverage and multi-turn interaction. As a result, they cannot support dynamic user-agent interaction and therefore fail to comprehensively assess agent capabilities. In this paper, we introduce TravelBench, a real-world travel-planning benchmark featuring multi-turn interaction and tool use. We collect user requests from real-world scenarios and construct three subsets-multi-turn, single-turn, and unsolvable-to evaluate different aspects of agent performance. For stable and reproducible evaluation, we build a controlled sandbox environment with 10 travel-domain tools, providing deterministic tool outputs for reliable reasoning. We evaluate multiple LLMs on TravelBench and conduct an analysis of their behaviors and performance. TravelBench offers a practical and reproducible benchmark for advancing LLM agents in travel planning.

TripScore: Benchmarking and rewarding real-world travel planning with fine-grained evaluation

Artificial Intelligence

Makes travel plans better and more real.

10 Oct 2025 3

92%

CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents

Artificial Intelligence

Helps AI plan cheaper trips by learning from mistakes.

4 Nov 2025 1

90%

Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents

Computation and Language

Tests if AI can change plans when things change.

5 Jun 2025 1

View PDF Login to Bookmark

TravelBench: A Real-World Benchmark for Multi-Turn and Tool-Augmented Travel Planning

Technical Abstract

TripScore: Benchmarking and rewarding real-world travel planning with fine-grained evaluation

CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents

Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents