Artificial Hippocampus Networks for Efficient Long-Context Modeling
By: Yunhao Fang , Weihao Yu , Shu Zhong and more
Potential Business Impact:
Helps computers remember more, faster, with less.
Long-sequence modeling faces a fundamental trade-off between the efficiency of compressive fixed-size memory in RNN-like models and the fidelity of lossless growing memory in attention-based Transformers. Inspired by the Multi-Store Model in cognitive science, we introduce a memory framework of artificial neural networks. Our method maintains a sliding window of the Transformer's KV cache as lossless short-term memory, while a learnable module termed Artificial Hippocampus Network (AHN) recurrently compresses out-of-window information into a fixed-size compact long-term memory. To validate this framework, we instantiate AHNs using modern RNN-like architectures, including Mamba2, DeltaNet, and Gated DeltaNet. Extensive experiments on long-context benchmarks LV-Eval and InfiniteBench demonstrate that AHN-augmented models consistently outperform sliding window baselines and achieve performance comparable or even superior to full-attention models, while substantially reducing computational and memory requirements. For instance, augmenting the Qwen2.5-3B-Instruct with AHNs reduces inference FLOPs by 40.5% and memory cache by 74.0%, while improving its average score on LV-Eval (128k sequence length) from 4.41 to 5.88. Code is available at: https://github.com/ByteDance-Seed/AHN.
Similar Papers
HEMA : A Hippocampus-Inspired Extended Memory Architecture for Long-Context AI Conversations
Computation and Language
Lets computers remember long talks perfectly.
Random Long-Context Access for Mamba via Hardware-aligned Hierarchical Sparse Attention
Computation and Language
Lets computers remember much longer stories.
Native Hybrid Attention for Efficient Sequence Modeling
Computation and Language
Makes AI understand long stories better and faster.