Score: 0

Efficient Context Scaling with LongCat ZigZag Attention

Published: December 30, 2025 | arXiv ID: 2512.23966v1

By: Chen Zhang , Yang Bai , Jiahuan Li and more

We introduce LongCat ZigZag Attention (LoZA), which is a sparse attention scheme designed to transform any existing full-attention models into sparse versions with rather limited compute budget. In long-context scenarios, LoZA can achieve significant speed-ups both for prefill-intensive (e.g., retrieval-augmented generation) and decode-intensive (e.g., tool-integrated reasoning) cases. Specifically, by applying LoZA to LongCat-Flash during mid-training, we serve LongCat-Flash-Exp as a long-context foundation model that can swiftly process up to 1 million tokens, enabling efficient long-term reasoning and long-horizon agentic capabilities.

Category
Computer Science:
Computation and Language