Score: 0

AstraNav-Memory: Contexts Compression for Long Memory

Published: December 25, 2025 | arXiv ID: 2512.21627v1

By: Botao Ren , Junjun Hu , Xinda Xue and more

Lifelong embodied navigation requires agents to accumulate, retain, and exploit spatial-semantic experience across tasks, enabling efficient exploration in novel environments and rapid goal reaching in familiar ones. While object-centric memory is interpretable, it depends on detection and reconstruction pipelines that limit robustness and scalability. We propose an image-centric memory framework that achieves long-term implicit memory via an efficient visual context compression module end-to-end coupled with a Qwen2.5-VL-based navigation policy. Built atop a ViT backbone with frozen DINOv3 features and lightweight PixelUnshuffle+Conv blocks, our visual tokenizer supports configurable compression rates; for example, under a representative 16$\times$ compression setting, each image is encoded with about 30 tokens, expanding the effective context capacity from tens to hundreds of images. Experimental results on GOAT-Bench and HM3D-OVON show that our method achieves state-of-the-art navigation performance, improving exploration in unfamiliar environments and shortening paths in familiar ones. Ablation studies further reveal that moderate compression provides the best balance between efficiency and accuracy. These findings position compressed image-centric memory as a practical and scalable interface for lifelong embodied agents, enabling them to reason over long visual histories and navigate with human-like efficiency.

Compressor-VLA: Instruction-Guided Visual Token Compression for Efficient Robotic Manipulation

Robotics

Helps robots see and act faster.

24 Nov 2025 0

88%

Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs

Artificial Intelligence

Helps robots understand places better to find their way.

29 Sep 2025 1

88%

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?

CV and Pattern Recognition

Makes computers understand long texts better.

17 Dec 2025 2

View PDF Login to Bookmark

AstraNav-Memory: Contexts Compression for Long Memory

Technical Abstract

Compressor-VLA: Instruction-Guided Visual Token Compression for Efficient Robotic Manipulation

Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?