Score: 0

SeqWalker: Sequential-Horizon Vision-and-Language Navigation with Hierarchical Planning

Published: January 8, 2026 | arXiv ID: 2601.04699v1

By: Zebin Han , Xudong Wang , Baichen Liu and more

Potential Business Impact:

Helps robots follow long, tricky directions.

Business Areas:

Navigation Navigation and Mapping

Sequential-Horizon Vision-and-Language Navigation (SH-VLN) presents a challenging scenario where agents should sequentially execute multi-task navigation guided by complex, long-horizon language instructions. Current vision-and-language navigation models exhibit significant performance degradation with such multi-task instructions, as information overload impairs the agent's ability to attend to observationally relevant details. To address this problem, we propose SeqWalker, a navigation model built on a hierarchical planning framework. Our SeqWalker features: i) A High-Level Planner that dynamically selects global instructions into contextually relevant sub-instructions based on the agent's current visual observations, thus reducing cognitive load; ii) A Low-Level Planner incorporating an Exploration-Verification strategy that leverages the inherent logical structure of instructions for trajectory error correction. To evaluate SH-VLN performance, we also extend the IVLN dataset and establish a new benchmark. Extensive experiments are performed to demonstrate the superiority of the proposed SeqWalker.

Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents

Artificial Intelligence

Helps robots follow directions in new places.

11 Aug 2025 2

90%

FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks

CV and Pattern Recognition

Helps robots learn new places without retraining.

18 Mar 2025 1

90%

Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments

Robotics

Helps robots see and follow instructions better.

26 Feb 2025 0

View PDF Login to Bookmark

Page Count

16 pages

SeqWalker: Sequential-Horizon Vision-and-Language Navigation with Hierarchical Planning

Helps robots follow long, tricky directions.

Technical Abstract

Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents

FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks

Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments